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The present invention provides methods for identifying and representing the biological pathways of drug action on a cell by: (i) 
measuring responses of cellular constituents to graded exposures of the cell to a drug of interest; (ii) measuring the responses of cellular 
constituents to perturbations in one or more biological pathways of the cell; and (iii) scaling a combination of the measured pathway 
responses to fit the measured drug responses best according to an objective measure. In alternative embodiments, the present invention also 
provides for assessing the significance of the identified representation and for verifying that the identified pathways are actual pathways 
of drug action. In various embodiments, the effects on the cell can be determined by measuring gene expression, protein abundances, 
protein activities, or a combination of such measurements. In various embodiments, perturbation to a biological pathway in the cell can t>e 
made by use of titratable expression systems, use of transfection systems, modification to abundances of pathway RNAs, modifications to 
abundances of pathway proteins, or modifications to activities of the pathway proteins. The present invention also provides methods for 
drug development based on the methods for identifying biological pathways of drug action, and methods for representing the biological 
pathways involved in the effect of an environmental change upon a cell. 



ATTORNEY DOCKET NUMBER: 9301-123 
SERIAL NUMBER: 09/724,538 
REFERENCE: CP 



FOR THE PURPOSES OF INFORMATION ONLY 



Codes used to identify States party to the PCT on the front pages of pamphlets publishing international applications under the PCT. 



AL 


Albania 


ES 


Spain 


LS 


Lesotho 


SI 


Slovenia 


AM 


Armenia 


FI 


Finland 


LT 


Lithuania 


SK 


Slovakia 


AT 


Austria 


FR 


France 


LU 


Luxembourg 


SN 


Senegal 


AU 


Australia 


GA 


Gabon 


LV 


Latvia 


sz 


Swaziland 


AZ 


Azerbaijan 


GB 


United Kingdom 


MC 


Monaco 


TD 


Chad 


BA 


Bosnia and Herzegovina 


GE 


Georgia 


MD 


Republic of Moldova 


TG 


Togo 


BB 


Barbados 


GH 


Ghaiu 


MG 


Madagascar 


TJ 


Tajikistan 


BE 


Belgium 


GN 


Guinea 


MK 


The former Yugoslav 


TM 


Turkmenistan 


BF 


Burkina Faso 


GR 


Greece 




Republic of Macedonia 


TR 


Turkey 


BG 


Bulgaria 


HU 


Hungary 


ML 


Mali 


TT 


Trinidad and Tobago 


BJ 


Benin 


IK 


Ireland 


MN 


Mongolia 


UA 


Ukraine 


BR 


Brazil 


IL 


Israel 


MR 


Mauritania 


UG 


Uganda 


BY 


Belarus 


IS 


Iceland 


MW 


Malawi 


US 


United States of America 


CA 


Canada 


IT 


Italy 


MX 


Mexico 


uz 


Uzbekistan 


CF 


Central African Republic 


JP 


Japan 


NE 


Niger 


VN 


Viet Nam 


CG 


Congo 


KE 


Kenya 


NL 


Netherlands 


YU 


Yugoslavia 


CH 


Switzerland 


KG 


Kyrgyzstan 


NO 


Norway 


ZW 


Zimbabwe 


a 


Cflte d'Woire 


KP 


Democratic People'* 


NZ 


New Zealand 






CM 


Cameroon 




Republic of Korea 


PL 


Poland 






CN 


China 


KR 


Republic of Korea 


PT 


Portugal 






cu 


Cuba 


KZ 


Kazakstan 


RO 


Romania 






cz 


Czech Republic 


LC 


Saint Ludt 


RU 


Russian Federation 






DE 


Germany 


U 


Liechtenstein 


SD 


Sudan 






DK 


Denmark 


LK 


Sri Lanka 


SE 


Sweden 






EE 


Estonia 


LR 


Liberia 


SG 


Singapore 







WO 99/58708 



PCT/US99/I0050 



METHODS FOR IDENTIFYING PATHWAYS OF DRUG ACTTOM 

1 FIELD OF THE INVENTION 
5 The field of this invention relates to methods for 

characterizing the action of drugs in cells, in particular 
for finding biological pathways in a cell affected by drug 
action, as well as application of these methods to drug 
discovery. 

10 

2 BACKGROUND 

The identification of the biological pathway of action 
of a drug or drug candidate is a problem of great commercial 
and human importance. Although the primary molecular target 

15 of and cellular pathways affected by a drug are often known 
or suspected because the drug was originally selected by a 
specific drug screen, it is important to verify its action on 
such a primary pathway and to quantify its action along other 
secondary pathways which may be harmful, or may be 

20 beneficial, often in unsuspected ways. In other cases, the 
primary pathways of drug action are unknown, and these must 
be determined. 

This information is important in many areas of practical 
research, such as, for example, drug discovery, which is a 

25 process by which bioactive compounds are identified and 
preliminarily characterized. Drug discovery is a critical 
step in the development of treatments for human diseases. 
Two approaches presently dominate the search for new drugs. 
The first begins with a screen for compounds that have a 

30 desired effect on a cell (e.g., induction of apoptosis) , or 
organism (e.g., inhibition of angiogenesis) as measured in a 
specific biological assay. Compounds with the desired 
activity may then be modified to increase potency, stability, 
or other properties, and the modified compounds retested in 

35 the assay. Thus, a compound that acts as an inhibitor of 
angiogenesis when tested in a mouse tumor model may be 
identified, and structurally related compounds synthesized 
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and tested in the same assay. One limitation of this 
approach is that, often, the mechanisms of action, such as 
the molecular target (s) and cellular pathways affected by the 
compound, are unknown, and cannot be determined by the 
5 screen. In addition, the assay may provide little 

information about the specificity, either in terms of targets 
or pathways, of the drug's effect. Finally, the number of 
compounds that can be screened by assaying biological effects 
on cells or animals is limited by the required experimental 
10 efforts. 

In contrast, the second approach to drug screening 
involves testing numerous compounds for a specific effect on 
a known molecular target, typically a cloned gene sequence or 
an isolated enzyme or protein. For example, high-throughput 

15 assays can be developed in which numerous compounds can be 
tested for the ability to change the level of transcription 
from a specific promoter or the binding of identified 
proteins. Although the use of high-throughput screens is a 
powerful methodology for identifying drug candidates, it has 

20 limitations. A major drawback is that the assay provides 
little or no information about the effects of a compound at 
the cellular or organismal level, in particular information 
concerning the actual cellular pathways affected. These 
effects must be tested by using the drug in a series of cell 

25 biologic and whole animal studies to determine toxicity or 
side effects in vivo. In fact, analysis of the specificity 
and toxicity studies of candidate drugs can consume a 
significant fraction of the drug development process (see, 
e.g., Oliff et al. , 1997, "Molecular Targets for Drug 

30 Development, 11 in DeVita et al. Cancer: P rinciples & Practice 
of Oncology 5th Ed. 1997 Lippincott-Raven Publishers, 
Philadelphia) . 

Several gene expression assays are now becoming 
practicable for quantitating the drug effect on a large 

35 fraction of the genes and proteins in a cell culture (see, 
e.g., Schena et al, 1995, Quantitative monitoring of gene 
expression patterns with a complementary DNA micro-array, 
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Science 270:467-470; Lockhort et al., 1996, Expression 
monitoring by hybridization to high-density oligonucleotide 
arrays, Nature Biotechnology 14:1675-1680; Blanchard et al . , 
1996, Sequence to array: Probing the genome's secrets, Nature 
Biotechnology 14, 1649; 1996, U.S. Patent 5,569,588, issued 
October 29, 1996 to Ashby et al. entitled "Methods for Drug 
Screening") . Raw data from these gene expression assays are 
often difficult to coherently interpret. Such measurement 
technologies typically return numerous genes with altered 
expression in response to a drug, typically 50-100, possibly 
up to 1,000 or as few as 10. * In the typical case, without 
more analysis, it is not possible to discern cause and effect 
from such data alone. The fact that one or a few genes among 
many has an altered expression in a pair of related 
biological states yields little or no insight into what 
caused this change and what the effects of this change are. 
These data in themselves do not inform an investigator about 
the pathways affected or mechanism of action. They do not 
indicate which effects result from effects on a primary 
pathway versus which effects are the result of other 
secondary pathways affected by the drug. Knowledge of all 
these affected pathways individually is useful in 
understanding efficacy, side-effects, toxicities, possible 
failures of efficacy, activation of metabolic responses, and 
so forth. Further, identification of all pathways of drug 
action can lead to discovery of alternate pathways suitable 
to achieve the original therapeutic purpose. 

Without effective methods of analysis, one is left to ad 
hoc further experimentation to interpret such gene expression 
results in terms of biological pathways and mechanisms. 
Systematic procedures . for guiding the interpretation of such 
data and such further experimentation, at least in the case 
of drug target screening, are needed. 

Thus, there is a need for improved (e.g., faster and 
less expensive) methods for characterizing drug activities, 
and cellular pathways affected by drugs based on effective 
interpretation of such data as gene expression data. The 
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present invention provides methods for rapidly identifying 
the molecular targets and pathways affected by candidate 
drugs and for characterizing their specificity. It further 
provides methods based on measurement methodologies other 
5 than gene expression analysis. 

3 BPMMARY OF THE INVENTION 

The invention provides methods for determining the 
10 primary and secondary biological pathways through which a 
drug acts on a cell, and identifying the proteins and genes 
which are affected via each pathway. The method involves 
comparing measurements of RNA or protein abundances or 
activities in response to drug exposure with measurements of 
is RNA or protein abundances or activities in pathways possibly 
affected by the drug in response to controlled, known 
perturbations of each pathway. RNA or protein abundances or 
activities are measured at varying strengths of drug 
exposure. The known pathway perturbations are controlled to 
20 be of varying strengths over a substantial part of the range 
from pathway inhibition up to pathway saturation. 
Additionally, the invention provides methods for verifying 
likely pathways affected by a drug by comparing measurements 
of RNA or protein abundances or activities in response to 
25 simultaneous drug treatment and controlled pathway 

perturbation. Further, the invention provides methods for 
comparing the effects of two different drugs by comparing 
measurements of RNA or protein abundances in response to 
exposure to a first drug with RNA or protein abundances in 
30 response to exposure to another drug or drugs. 

The methods of this invention are based on the discovery 
that analysis of biological pathways of drug action can be 
made robust and reliable by utilizing data covering a range 
of pathway perturbation strengths and drug exposure levels. 
35 in general, perturbations of biological pathways and drug 
exposure levels preferably cover the range from no effects 
all the way to saturation. In prior methods, such analysis 
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is often based on only two points within these ranges, namely 
no exposure or perturbation and fully saturating exposure or 
perturbation. Such limited information leads to less robust 
and reliable results. 
5 For example, these methods achieve significant benefits 

and improvements over methods of analysis based merely on use 
of genetic deletions or over-expression, which typically 
yield only data for fixed saturating conditions. First, 
genetic deletions or over-expression strains are not always 
10 available for the biological system of interest. Second, use 
of response data from a range of biological pathway 
perturbation strengths and drug exposure levels greatly 
improves the ability to distinguish effects mediated along 
different pathways. For example, a drug under study may have 
15 different potencies along two pathways which converge to 
affect an overlapping set of genes. Experiments spanning a 
range of pathway perturbation strengths and drug exposure 
levels can show that these pathways become effective at 
different drug exposure levels. On the other hand, data from 
20 genetic deletion mutations which completely interrupt each 
pathway are incapable of resolving such differences in 
potencies . 

In more detail, the present invention provides methods 
for identifying and representing the biological pathways of 
25 drug action on a cell by: (i) measuring responses of cellular 
constituents to graded exposures of the cell to a drug of 
interest; (ii) measuring the responses of cellular 
constituents to perturbations in one or more biological 
pathways of the cell; and (iii) scaling a combination of the 
30 measured pathway responses to fit the measured drug responses 
best according to an objective measure. In alternative 
embodiments, the present invention also provides for 
assessing the significance of the identified representation 
and for verifying that the identified pathways are actual 
35 pathways of drug action. In various embodiments, the 
responses of cellular constituents can be measured by 
measuring gene expression (i.e., RNA levels), protein 
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abundances, protein activities, or a combination of such 
measurements. In various embodiments, perturbation to a 
biological pathway in the cell can be made by use of 
titratable expression systems, use of transection systems, 
modification to abundances of pathway RNAs, modifications to 
abundances of pathway proteins, or modifications to 
activities of the pathway proteins. The present invention 
also provides methods for drug development based on the 
methods for identifying biological pathways of drug action. 

In a first embodiment, this invention provides a method 
of representing biological pathways involved in the action of 
a drug in a cell type comprising: (a) providing a drug 
response of said drug in said cell type, said drug response 
having been obtained by a method comprising measuring a 
15 plurality of cellular constituents in a cell of said cell 
type at a plurality of levels of exposure to said drug; (b) 
representing a model drug response as a combination of one or 
more biological pathway responses in said cell type, wherein 
a biological pathway response in said cell type is the 
20 product of a method comprising measuring cellular 

constituents of said biological pathway in a cell of said 
cell type at a plurality of levels of a perturbation to said 
biological pathway, and wherein each of said one or more 
biological pathway responses in said combination are subject 
25 to an independent scaling transformation; and (c) determining 
best scaling transformations of said one or more biological 
pathway responses which minimize the value of an objective 
function of the difference between said drug response and 
said model drug response, whereby said combination of said 
30 one or more biological pathways responses subject to said 
best scaling transformations represents the biological 
pathways involved in the action of said drug in said cell 
type. 

In a first aspect of the first embodiment, the invention 
35 further provides that said determining step further comprises 
determining an "actual" minimized value of said objective 
function, and, after said step of determining (i.e., step 
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(c) , above) , a step of assessing the statistical significance 
of said best scaling transformations of said one or more 
biological pathways by a method comprising: (a) obtaining an 
expected probability distribution of minimized values of said 
5 objective function; and (b) assessing the statistical 

significance of said actual minimum value of said objective 
function in view of said expected probability distribution of 
minimum values of said objective function, wherein said 
actual minimized value of said objective function is the 
10 minimized value of said objective function determined from 
said provided drug response and said model drug response. In 
this aspect of the first embodiment, the invention further 
provides that said step of obtaining said expected 
probability distribution of minimum values of said objective 
15 function further comprises the steps of: (a) randomizing 
said drug response with respect to said plurality of levels 
of drug exposure and randomizing said model drug response by 
randomizing said one or more biological pathway responses 
with respect to said plurality of levels of perturbation to 
20 said one or more biological pathways; (b) determining a 
"theoretical" minimized value of said objective function by 
finding best scaling transformations of said one or more 
randomized biological pathway responses which minimize said 
objective function of the difference between said randomized 
25 drug response and said randomized model drug response; and 
(c) repeating the two previous steps to determine a 
plurality of theoretical minimum values, said plurality of 
theoretical minimized values forming said expected 
probability distribution of minimized values. 
30 In a third aspect of the first embodiment, the invention 

further provides, after said step of determining, a step of 
verifying that said one or more biological pathways are 
biological pathways actually involved in the action of said 
drug in said cell type by a method comprising: (a) providing 
: 35 combined drug-perturbation responses in said cell type by a 
method comprising measuring a plurality of cellular 
constituents in a cell of said cell type exposed 
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simultaneously to one or more levels of said exposure to said 
drug and to one or more levels of perturbations in said one 
or more biological pathways; and (b) selecting which of the 
following model responses behaves most similarly to said 
5 combined drug-perturbation responses: (i) a first model 
response comprising said combination of said one or more 
biological pathway responses subject to said best scaling 
transformations evaluated at one or more first sums, each 
said first sum being the sum of one of said one or more 
10 levels of drug exposure subject to said scaling 

transformations and one of said one or more levels of 
perturbations to said biological pathways, (ii) a second 
model response comprising said one or more second sums, each 
said second sum being the sum of said drug response evaluated 
15 at one of said one or more levels of drug exposure and said 
combination of said one or more biological pathway responses 
subject to said best scaling transformations evaluated at one 
of said one or more levels of perturbations to said 
biological pathways, whereby said one or more biological 
20 pathways are verified as biological pathways actually 

involved in the action of said drug in said cell type if said 
first model response is selected. 

in a fourth aspect of the first embodiment, the 
invention further provides, after said step of determining, a 
25 step of assigning a cellular constituent present in said drug 
response to the one of said one or more biological pathways 
in which said biological pathway response of said cellular 
constituent subject to its best scaling transformation has 
the greatest correlation with said drug response of said 

30 cellular constituent. 

in a second embodiment, this invention provides a method 
of determining a more pathway-specific drug candidate from an 
initial drug candidate comprising: (a) representing the 
biological pathways involved in the action of an initial drug 

35 candidate by the method of the first embodiment; (b) 

modifying the structure of said initial drug candidate; (c) 
representing the biological pathways involved in the action 
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of .aid .calf lea initial drug canape by the method of the 
first embodiment; and <d, defining that said modified 
initial drug candidate is a .ore pathway-specific <*»5 
candidate than said initial drug candidate if said modified 
S Tnill drug candidate has fever biologic*! pathways invoived 
in its action than said initial drug candidate. 

in a third embodiment, this invention provides a method 
of identifying one or .ore specific biological ^^T- 
are invoived in th. action of a drug and that mediate side 
a „ offsets of the drug, said method comprising, (a) carrying out 
the method of the first embodiment for , first drug; (b, 
carrying out the method of the first embodiment for a second 
^ wherein the first and the second drug are different and 
exhibit therapeutic efficacy for the same disease or 
is disorder; and (c) identifying those specific biological 
" Xs involved in the action of said first 

different from those biological pathways involved in the 
action of said second drug, thereby identifying one or more 
s^ific biological pathways that are involved in the action 
20 Of said first drug and that mediate side-effects of said 

""^fourth embodiment, this invention provides a method 
of identifying one or more specific biological pathway* ^hat 
are involved in mediating therapeutic efficacy for a disease 
« or disorder, said method comprising: (a) carrying out the 
" metL of tie first embodiment for a first drug; <b> carrying 
out the method of the first embodiment for a second drug, 
wherein the first and the second drug are different and 
exhibit therapeutic efficacy for the same disease or 
,„ border- and (c) identifying those specific biological 
" TZZ involved in the action of both said first drug and 
said second drug, thereby identifying one or more specific 
biological pathways that are involved in th. action of said 
first drug and that mediate therapeutic efficacy for said 

disease or disorder. 

I„ a fifth embodiment, this invention provides a method 
for comparing drug responses from two different drugs on a 
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cell type, and thereby measuring the similarity of the 
effects of the two different drugs on said cell type by: (a) 
providing a first drug response for a first drug of interest 
in said cell type, said drug response having been obtained by 
S a method comprising measuring a plurality of cellular 

constituents in a cell of said cell type at a plurality of 
levels of exposure to said first drug; (b) providing a 
second drug response for a second drug of interest in a cell 
type, said second drug response having been obtained by a 
10 method comprising measuring a plurality of cellular 

constituents in a cell of said cell type at a plurality of 
levels of exposure to said second drug; and (c) determining 
best scaling transformation of said second drug response 
which minimizes the value of an objective function of the 
15 difference between said first and second drug responses. 

in a sixth embodiment, this invention provides a method 
of representing biological pathways involved in the effect of 
an environmental change upon a cell type comprising: (a) 
providing an environmental response to said environmental 
20 change in said cell type, said environmental response having 
been obtained by a method comprising measuring a plurality of 
cellular constituents in a cell of said cell type at a 
plurality of degrees of severity of said environmental 
change; (b) representing a model environmental response as a 
25 combination of one or more biological pathway responses in 
said cell type, wherein a biological pathway response in said 
cell type is the product of a method comprising measuring 
cellular constituents of said biological pathway in a cell of 
said cell type at a plurality of levels of a perturbation to 
30 said biological pathway, and wherein each of said one or more 
biological pathway responses in said combination are sub 3 ect 
to an independent scaling transformation; and (c) determining 
best scaling transformations of said one or more biological 
pathway responses which minimize the value of an objective 
35 function of the difference between said environmental 

response and said model environmental response, whereby said 
combination of said one or more biological pathway responses 
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subject to said best scaling transformations represents the 
biological pathways involved in the effect of said 
environmental change upon said cell type. 

in a seventh embodiment, this invention provides a 
5 computer system for representing biological pathways involved 
in the action of a drug in a cell type comprising a processor 
and a memory coupled to said processor, said memory encoding 
one or more programs, said one or more programs causing said 
processor to perform a method comprising the steps of : (a) 
10 receiving a drug response of said drug in said cell type, 
said drug response comprising measurements of a plurality of 
cellular constituents in a cell of said cell type at a 
plurality of levels of drug exposure; (b) receiving one or 
m ore biological pathway responses, each of said one or more 
15 biological pathway responses comprising measurements of 

cellular constituents of said biological pathway in a cell of 
said cell type at a plurality of levels of a perturbation to 
said biological pathway; (c) forming a model drug response as 
a combination of said one or more biological pathway, each of 
20 said one or more biological pathway responses in said 

combination subject to an independent scaling transformation; 
(d) determining the value of an objective function of the 
difference between said drug response and said model drug 
response; and (e) minimizing said determined value of said 
25 objective function by varying the scaling transformations of 
said one or more biological pathway responses to obtain best 
scaling transformation that minimize said determined value of 
said objective function; whereby said combination of said one 
or more biological pathways responses subject to said best 
30 scaling transformations represents the biological pathways 
involved in the action of said drug in said cell type. 

4 ffB TFy nEsnBTTOTOM P » TF* DRRWINGS 
Fig. l illustrates exemplary pathways hypothesized for 
- 35 the action of drug D on a biological system. 

Fig. 2 A illustrates exemplary responses of expression of 
genes Gl, G2, and G3 in the biological system of Fig. 1 to 
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exposure to drug D (values are normalized to untreated 
value); Fig. 2B illustrates exemplary responses of genes Gl, 
G2, and G3 in pathway originating at protein Pi to graded 
perturbations of PI; Fig. 2C illustrates an exemplary 
5 correlation between response illustrated in Figs. 2A-B. 

Fig. 3 illustrates response curves of the 30 yeast 
genes, out of approximately 6000 measured yeast genes, that 
had the largest expression ratio changes to methotrexate drug 
exposure; methotrexate exposure levels were 3, 6, 25, 50, 
10 100, and 200 pa; the 100 »m titration resulted in a 50% 
growth defect; responses have been set to zero at the 
arbitrary abscissa of -0.5. 

Fig. 4 illustrates the fit of a Hill function to the 
response of gene YOL031C illustrated in Fig. 3. 
15 Fig. 5 illustrates a flow chart of an embodiment of the 

methods of the invention. 

Fig. 6 illustrates possible alternative pathways for the 

action of drug D on Gene G k . 

Figs 7A-B illustrate surface renderings of Eqns. 10 and 

20 11. 

Figs. 8A-C illustrate response curves of the yeast genes 
that had the largest expression ratio changes to exposure to 
the drugs cyclosporin A, methotrexate, and FK506, 
respectively; Fig. 8D illustrates a correlation of the 
25 responses illustrated in Fig. 8C to the sum of the response 
in Figs. 8A-B. 

Fig. 9 illustrates an exemplary embodiment of a computer 
system of this invention. 

30 5 PETAILED nBBCRIPTION 

This section presents a detailed description of the 
invention and its application to drug discovery. This 
description is by way of several exemplary illustrations, in 
increasing detail and specificity, of the general methods of 
35 this invention. These examples are non-limiting, and related 
variants that will be apparent to one of skill in the art are 
intended to be encompassed by the appended claims. Following 
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these examples are descriptions of embodiments of the data 
gathering steps that accompany the general methods. 

S.l TMTRODPCTION 
5 The invention includes methods for determining the 

biological pathways through which a drug acts on a biological 
system {e.g., a cell, or an organism, or a patient). These 
methods involve comparing measurements of changes in the 
biological state of a cell in response to graded drug 
10 exposure with measurements of changes in the biological state 
of biological pathways that are likely to be involved in the 
effects of the drug, the changes being in response to known 
and graded perturbations of these pathways. Output from this 
comparison is a representation of the action of the drug on 
15 the cell as a combination of independent actions of the drug 
on each individual biological pathways. 

This section first presents certain preliminary concepts 
including those of drug action, of the biological state of a 
cell, and of biological pathways, which, according to this 
20 invention, represent drug action in a cell. Next, a 

schematic and non-limiting overview of the methods of this 
invention is presented. The following sections present the 
methods of this invention in greater detail. 

Although, for simplicity this disclosure often makes 
25 reference to single cells (e.g., "RNA is isolated from a cell 
perturbed at a single gene"), it will be understood by those 
of skill in the art that more often any particular step of 
the invention will be carried out using a plurality of 
genetically similar cells, e.g., from a cultured cell line. 
30 Such similar cells are called herein a "cell type". Such 
cells are either from naturally single celled organisms or 
derived from multi-cellular higher organisms. 

In particular, Section 5.1 describes certain preliminary 
concepts useful in the further description of this invention. 
35 section 5.2 generally describes the methods of this 
invention. Section 5.3 describes a preferred analytic 
embodiment of the methods of this invention. Section 5.4 
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describes methods of perturbing biological pathways. Section 
5 5 describes methods of measuring cellular constituents. 
Finally, Section 5.6 describes certain exemplary applications 
of this invention to drug discovery and development. 

5 

pr-T »rM™ and Biological State 

According to the current invention, drugs are any 
compounds of any degree of complexity that perturb a 
biological system, whether by known or unknown mechanisms and 
10 whether or not they are used therapeutically. Drugs thus 
include: typical small molecules of research or therapeutxc 
interest; naturally-occurring factors, such as endocrine, 
paracrine, or autocrine factors or factors interacting with 
cell receptors of all types; intracellular factors, such as 
15 elements of intracellular signaling pathways; factors 
isolated from other natural sources; and so forth. The 
biological effect of a drug may be a consequence of, inter 
alia, drug-mediated changes in the rate of transcription or 
degradation of one or more species of RNA, the rate or extent 
20 of translation or post-translational processing of one or 
more polypeptides, the rate or extent of the degradation of 
one or more proteins, the inhibition or stimulation of the 
action or activity of one or more proteins, and so forth. In 
fact, most drugs exert their affects by interacting with a 
25 protein. Drugs that increase rates or stimulate activities 
of a protein are called herein "activating drugs," while 
drugs that decrease rates or inhibit activities of a protein 
are called herein "inhibiting drugs." 

in addition to drugs, this invention is equally 
30 applicable to those changes in or aspects of the physical 
environment that perturb a biological system in targeted 
manners, such environmental changes can include moderate 
changes of temperature (e.g., a temperature elevation of 
10 • C) or exposure to moderate doses of radiation. Other 
35 environmental aspects include the nutritional environment, 
such as the presence of only particular sugars, amino acids, 
and so forth. 
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The biological effects of a drug (or a physical 
environmental change) are measured in the instant invention 
by observations of changes in the biological state of a cell. 
The biological state of a cell, as used herein, is taken to 
5 mean the state of a collection of cellular constituents, 

which are sufficient to characterize the cell for an xntended 
purpose, such as for characterizing the effects of a drug. 
The measurements and/or observations made on the state of 
these constituents can be of their abundances (i.e., amounts 
10 or concentrations in a cell) , or their activities, or thexr 
states of modification (e.g., phosphorylation), or other 
measurement relevant to the characterization of drug action, 
in various embodiments, this invention includes making such 
measurements and/or observations on different collections of 
15 cellular constituents. These different collections of 

cellular constituents are also called herein aspects of the 
biological state of the cell. (As used herein, the term 
..cellular constituents" is not intended to refer to known 
subcellular organelles, such as mitochondria, lysozomes, 

2 0 ©to ) 

' one aspect of the biological state of a cell usefully 
measured in the present invention is its transcriptional 
state. The transcriptional state of a cell includes the 
identities and abundances of the constituent RNA species, 
25 especially mRNAs, in the cell under a given set of 

conditions. Preferably, a substantial fraction of all 
constituent RNA species in the cell are measured, but at 
least, a sufficient fraction is measured to characterize the 
action of a drug of interest. The transcriptional state is 
30 the currently preferred aspect of the biological state 
measured in this invention. It can be conveniently 
determined by, e.g. ,. measuring cDNA abundances by any of 
several existing gene expression technologies. 

Another aspect of the biological state of a cell 
35 usefully measured in the present invention is its 

translational state. The translational state of a cell 
includes the identities and abundances of the constituent 
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protein species in the cell under a given set of conditions. 
Preferably, a substantial fraction of all constituent protein 
species in the cell are measured, but at least, a sufficient 
fraction is measured to characterize the action of a drug of 
5 interest. As is known to those of skill in the art r the 
transcriptional state is often representative of the 
translational state. 

Other aspects of the biological state of a cell are also 
of use in this invention. For example, the activity state of 
10 a cell, as that term is used herein, includes the activities 
of the constituent protein species (and also optionally 
catalytically active nucleic acid species) in the cell under 
a given set of conditions. As is known to those of skill in 
the art, the translational state is often representative of 
15 the activity state. 

This invention is also adaptable, where relevant, to 
"mixed" aspects of the biological state of a cell in which 
measurements of different aspects of the biological state of 
a cell are combined. For example, in one mixed aspect, the 
20 abundances of certain RNA species and of certain protein 

species, are combined with measurements of the activities of 
certain other protein species. Further, it will be 
appreciated from the following that this invention is also 
adaptable to other aspects of the biological state of the 
25 cell that are measurable. 

Drug exposure will typically affect many constituents of 
whatever aspect of the biological state of a cell is being 
measured and/ or observed in a particular embodiment of this 
invention. For example, as a result of regulatory, 
30 homeostatic, and compensatory networks and systems known to 
be present in cells, even an "ideal drug," i.e., a drug that 
directly affects only a single constituent in a cell, and 
without direct effects on any other constituent, will have 
complicated and often unpredictable indirect effects. 
35 Consider, for example, a drug that specifically and 
completely inhibits activity of a single hypothetical 
protein, protein P. Although the drug itself will directly 
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change the activity of only protein P, additional cellular 
constituents that are inhibited or stimulated by protein P, 
or which are elevated or diminished to compensate for the 
loss of protein P activity will also be affected. Still 
5 other cellular constituents will be affected by changes in 
the levels or activity of the second tier constituents, and 
so on. Therefore, the direct effect of the drug on its 
target, protein P, is hidden in the large number of indirect 
effects downstream from protein P. Such downstream effects 

10 of protein P are called herein the biological pathway 
originating at protein P (see below) • 

Accordingly, a drug that is not ideal, e.g., one that 
directly affects more than one molecular target, may have 
still more complicated downstream effects. In one aspect, 

15 according to the present invention, the analysis of these 
effects provides considerable information about the drug 
including, for example, identification of biological pathways 
effected by the drug and which explain its action and side 
effects of toxicities in the cell. In a related aspect, the 

20 present invention provides methods for carrying out this 
analysis. 

Measurement of the transcriptional state of a cell is 
preferred in this invention, not only because it is 
relatively easy to measure but also because, although a drug 

25 may act through a post- transcriptional mechanism (such as 

inhibition of the activity of a protein or change in its rate 
of degradation) , the administration of a drug to a cell 
almost always results in a measurable change, through direct 
or indirect effects, in the transcriptional state. A reason 

30 that drug exposure changes the transcriptional state of a 

cell is because the previously mentioned feedback systems, or 
networks, which react in a compensatory manner to infections, 
genetic modifications, environmental changes, including drug 
administration, and so forth, do so primarily by altering 

35 patterns of gene expression or transcription. As a result of 
internal compensations, many perturbations to a biological 
system, although having only a muted effect on the external 
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behavior of the system, can nevertheless profoundly influence 
the internal response of individual elements, e.g., gene 
expression, in the cell. 

5 Biologic? l Pathways 

In the instant invention, drug effects on a cell, 
whether an ideal or a non- ideal drug and however measured in 
a particular implementation, are represented by combining the 
effects of the drug on individual biological pathways. For 
10 example, Fig. 1 illustrates that drug D acts on a cell by 
interacting with biological pathways 101, 102, and 103 
(details of pathway 103 are not illustrated) . The arcs 
between drug D and these pathways represent possible action 
of drug D on these pathways. The entire action of drug D on 
15 the cell is assumed to be expressible as a combination of 
drug D's actions on one or more of these three pathways. In 
the following paragraphs, first, biological pathways as 
generally used according to this invention are described, 
followed by description of particular biological pathways to 
20 which this invention is advantageously applied. 

As used herein, a biological pathway is generally 
understood to be a collection of cellular constituents 
related in that each cellular constituent of the collection 
is influenced according to some biological mechanism by one 
25 or more other cellular constituents in the collection. The 
cellular constituents making up a particular pathway can be 
drawn from any aspect of the biological state of a cell, for 
example, from the transcriptional state, or the translational 
state, or the activity state, or mixed aspects of the 
30 biological state. Therefore, cellular constituents of a 
pathway can include mRNA levels, protein abundances, protein 
activities, degree of protein or nucleic acid modification 
(e.g., phosphorylation or methylation) , combinations of these 
types of cellular constituents, and so forth. Each cellular 
35 constituent of the collection is influenced by at least one 
other cellular constituent in the collection by some 
biological mechanism, which need not be specified or even 

- 18 - 



WO 9*9/58708 



PCT/US99/10050 



understood. In illustrations presented herein, the 
influence, whether direct or indirect, of one cellular 
constituent on another is presented as an arc between the two 
cellular constituents, and the entire pathway is presented as 
5 a network of arcs linking the cellular constituents of the 
pathway. A biological pathway, therefore, refers both to the 
collection of cellular constituents drawn from some aspect of 
the biological state together with the network of influences 
between the constituents. 
10 For example, in Fig. 1, biological pathway 101 includes 

protein PI (for example, either the abundance or activity of 
PI) and genes Gl, G2, and G3 (for example, their transcribed 
mRNA levels) together with the influence, direct or indirect, 
of protein PI on these three genes, represented as the arc 
15 leading from Pi to these three genes. The mechanism of this 
influence might arise, for example, because protein PI can 
bind to promoters of these genes and increase the abundance 
of their transcripts. 

Concrete examples of biological pathways, as understood 
20 herein, are well known in the art. They depend on various 
biological mechanisms by which the cellular constituents 
influence one another. For example, biological pathways 
include well-known biochemical synthetic pathways in which, 
for example, molecules are broken down to provide cellular 
25 energy or built up to provide cellular energy stores, or in 
which protein or nucleic acid precursors are synthesized. 

- The cellular constituents of synthetic pathways include 

_ enzymes and the synthetic intermediates, and the influence of 
a precursor molecule on a successor molecule is by direct 
30 enzyme-mediated conversion. Biological pathways also include 
signaling and control pathways, many examples of which are 
also well known. Cellular constituents of these pathways 

- include, typically, primary or intermediate signaling 
molecules, as well as the proteins participating in the 

35 signal or control cascades usually characterizing these 
pathways. In signaling pathways, binding of a signal 
molecule to a receptor usually directly influences the 
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abundances of intermediate signaling molecules and indxrectly 
influences on the degree of phosphorylation (or other 
edification) of pathway proteins. Both of these 
turn influence activities of cellular protexns that are key 
5 effectors of the cellular processes initiated by the sxgnal, 
for example, by affecting the transcriptional state of the 
cell. Control pathways, such as those controlling the txmxng 
and occurrence of the cell cycle, are similar. Here, 
multiple, often ongoing, cellular events are temporally 
XO coordinated, often with feedback control, to achieve a 
consistent outcome, such as cell division with chromosome 
segregation. This coordination is a consequence of 
functioning of the pathway, often mediated by mutual 
influences of proteins on each other's degree of 
15 phosphorylation (or other modification) . Also, well Known 
control pathways seek to maintain optimal levels of cellular 
metabolites in the face of a fluctuating environment. 
Further examples of cellular pathways operating accordxng to 
understood mechanisms will be known to those of skill xn the 
2 0 stir fc» 

" pathways of particular interest in this invention are 
defined as those that "originate" at particular cellular 
constituents, especially hierarchical pathways that orxgxnate 
at particular cellular constituents. A pathway orxginatxng 
25 at particular cellular constituents includes those particular 
cellular constituents, a second group of cellular 
constituents that are directly influenced by the particular^ 
cellular constituents, a third group of cellular constituents 
that are directly influenced by the second group of cellular 
30 constituents, and so forth, along with the network of 
influences between the groups of cellular constituents, 
influences between the cellular constituents can be according 
to any biological mechanism, for example, a signaling 
mechanism, or a regulatory or homeostatic control »echanxsm 
35 or a synthetic mechanism. In Fig. 1. pathway 101, xncluding 
a protein and several genes, originates at protein Pi. 
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Pathway 102, including two proteins and several genes, 
originates at proteins P2 and P3. 

Biological pathways can also be either hierarchical or 
non-hierarchical. Generally, a hierarchical biological 
5 pathway has no feedback loops. In more detail, a 
hierarchical pathway is one in which its cellular 
constituents can be arranged into a hierarchy of numbered 
levels so that cellular constituents belonging to a 
particular numbered level can be influenced only by cellular 
10 constituents belonging to levels of lower numbers. A 
hierarchical pathway originates from the lowest numbered 
cellular constituents. In Fig. 1, pathways 101 and 102 are 
hierarchical. Pathway 101 is clearly hierarchical. In 
pathway 102, proteins P2 and P3, on a lowest numbered level, 
15 both (directly) affect gene G, on an intermediate numbered 
level. In turn, gene G (perhaps indirectly) affects genes 
G4, G5, and G6, all on a highest numbered level. In 
contrast, a non-hierarchical pathway has one or more feedback 
loops. A feedback loop in a biological pathway is a subset 
20 of cellular constituents of the pathway, each constituent of 
the feedback loop influences and also is influenced by other 
constituents of the feedback loop. For example, in pathway 
102 of Fig. 1, if gene G6 (perhaps indirectly) affected 
protein P3, a feedback loop including genes G and G6 and 
25 protein P3 would be created. 

In summary, therefore, as used herein, a biological 
pathway includes a collection of cellular constituents that 
influence one another through any biological mechanism, known 
or unknown, such as by a cell's synthetic, regulatory, 
30 homeostatic, or control networks. The influence of one 
cellular constituent on another can be, inter alia, by a 
synthetic transformation of the one cellular constituent into 
the other, by a direct physical interaction of the two 
cellular constituents, by an indirect interaction of the two 
35 cellular constituents mediated through intermediate 

biological events, or by other mechanisms. Further, certain 
pathways that are of particular interest in this invention 
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can be said to originate at particular cellular constituents, 
which influence, but are not in turn influenced by, the other 
cellular constituents in the pathway and among such pathways, 
those without feedback loops are said to be hierarchical. 
5 Because this invention is directed to representing drug 

action by combinations of biological pathways, certain types 
of pathways are of particular interest* Drugs typically act 
on a cell by directly interacting with one cellular 
constituent, and more usually with a plurality of 5 to 10 to 

10 50 or more cellular constituents. Such cellular constituents 
are called herein the "targets" of the drug. Further effects 
of the drug on the cell flow from the other cellular 
constituents influenced, directly or indirectly, by the 
direct targets of the drug. Therefore, pathways of interest 

15 in this invention for representing drug action include those 
that originate at particular cellular constituents, and 
especially, are hierarchical. In particular, the originating 
cellular constituents are preferably those that are potential 
drug targets. Since most drug targets are proteins, in 

20 particular, pathways originating at cellular proteins are of 
especial interest in representing drug action. Hierarchical 
pathways are advantageous in representing drug action, 
because the feedback loops present in non-hierarchical 
pathways can obscure drug effects by causing compensating 

25 influences in cellular constituents that mute drug 
influences. 

The following descriptions of the various embodiments of 
this invention, for economy of language only and without any 
limitation, are primarily directed to pathways, and often 

30 only to hierarchical pathways, originating at particular 
proteins. In view of the following description, it will be 
apparent to one of skill in the art how to apply the 
invention to pathways, including non-hierarchical pathways, 
originating at other cellular constituents, such as mRNA 

35 abundances. 

identification and Perturbation of Biological Pathways 
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Biological pathways, especially pathways that originate 
at proteins or that are hierarchical, can be identified for 
use in this invention by several means, including by use of 
known pathways and by measurements of aspects of the 
5 biological state of a cell. A known pathway, such as one of 
the exemplary types of pathways mentioned above, often 
includes known active proteins (or other types of cellular 
constituents) which may be drug targets. This entire known 
pathway can be used to represent drug action. Alternatively, 
10 parts (also called herein « sub-pathways") of such a known 
pathway can be used in this invention. For example, sub- 
pathways originating from any one or more of these known 
proteins (or other cellular constituents likely to be drug 
targets) include pathway constituents directly or indirectly 
15 influenced by the one or more known proteins (and excluding 
pathway constituents influencing these one or more proteins) . 
A plurality of sub-pathways can be derived from a single 
known pathway. One or more of these sub-pathways can be used 
to represent drug action in the methods of this invention. 
20 Biological pathways for use in this invention can also 

be identified in sufficient detail by measurements of aspects 
of the biological state of a cell, for example, by 
measurements of the transcriptional state, or of the 
translational state, or of the activity state, or of mixed 
25 aspects of the biological state. By measurements of an 

aspect of the biological state of a cell subject to various 
perturbing conditions, such as conditions resulting from 
exposure to various drugs or from various genetic 
manipulations, collections of cellular constituents that vary 
30 in a correlated fashion can be identified. Correlated 
variation means herein that ,the relative variation of the 
cellular constituents in the collection, in other words the 
pattern of variation of the cellular constituents, is similar 
in the different conditions. A network of mutual influences 
35 linking the collection of constituents into a biological 
pathway can be inferred from the similar pattern of 
variations in different conditions. When the various 



- 23 - 



WO 99/58708 



PCT7US99/10O50 



conditions during measurement act on the biological pathway, 
the constituents of the pathway respond with similar patterns 
of variation determined by the type and direction of their 
mutual influences. Even if neither the exact network of 
5 influences nor the mechanism of their action is known, this 
collection of constituents can be used as one biological 
pathway in this invention. 

For example, a drug known to act at a single defined 
target can be used to measure the pathway originating from 

10 this target. A cell is exposed to varying concentrations of 
the drug and the cellular constituents of an aspect of the 
biological state, for example, the transcriptional state, are 
measured. Those cellular constituents that vary in a 
correlated pattern as the concentrations of the drug are 

15 changed can be identified as a pathway originating at that 
drug. 

Additionally, as in the case of already known pathways, 
sub-pathways of a measured pathway can be determined if 
measurement during exposure to further conditions reveals 
20 that sub-collections of the original pathway vary according 
to different patterns. These differently varying sub- 
collections then constitute sub-pathways applicable in this 
invention. Cellular constituents of the measured pathway can 
be grouped according to the sub-pathway through which they 
25 are most affected. 

For example, where a pathway has been identified by 
measurements of a cell exposed to varying concentrations of a 
drug, sub-pathways can be identified by performing gene 
knockouts on the cell. By measuring, e.g., the 
30 transcriptional state of a cell exposed to the drug and 
having certain gene knockouts, sub-pathways of the drug 
pathway originating at the deleted gene can be identified. 

Fig. 3 illustrates an example of a pathway identified by 
measurement. This figure illustrates mRNA expression levels 
35 of 30 genes of the yeast Saccharomyces cerevisiae that, of 
the approximately 6000 genes in the genome of this yeast, had 
the largest expression changes in response to six different 



- 24 - 



WO 99/58708 



PCT/US99/I0050 



titrations of the drug methotrexate. These gene expression 
level measurements were made with gene transcript arrays as 
described in Section 5.4. Each of these 30 genes exhibited a 
correlated variation in response to exposure to various 
5 concentrations of methotrexate, in that each gene exhibited 
either a uniform increase or decrease from a native abundance 
to a saturation abundance in response to increasing 
concentrations of methotrexate. Accordingly, these 30 genes 
can be employed in this invention as a pathway, which 

10 encompasses cellular constituents of the transcriptional 

state influenced by methotrexate. Additionally, if exposure 
to further conditions, such as to different drugs or to drug 
knockouts, reveals additional patterns of behavior, then this 
group of 30 genes may be subdivided into yet additional sub- 

15 pathways. 

The methods of this invention employ measurements of 
graded perturbations of biological pathways. They compare 
measurements of graded perturbations of pathways likely to be 
relevant to the action of a drug with measurements of graded 

20 exposure of a cell to the drug in order to identify pathways 
actually involved in action of the drug. Graded pathway 
perturbations can be performed in several manners. In the 
case of known or measured pathways which originate from known 
proteins (or other cellular constituents) , the abundance or 

25 activity of these proteins (or other cellular constituents) 
can be perturbed in a graded manner by methods such as 
mutation, transfection, controllable promoter systems, or 
other drugs of specific known action. These methods are 
described in more detail in described in Section 5.4. 

30 Graded perturbations to the originating cellular 

constituents will be propagated to other cellular 
constituents of the pathway by means of the network of 
influences defining the pathway. The response data consist 
of, inter alia, gene transcript or protein abundance 

35 measurements for the genes or gene products in the affected 
pathway. Response data can be measured by methods described 
in more detail subsequently in Sections 5.5. 
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In the case of pathways defined by measurement, it is 
particularly advantageous if the constituents from which the 
pathway originates are identified. In that case, these 
originating constituents can be perturbed as described for 
5 known pathways. If the originating constituents are not 
identified, the conditions defining the pathway can be 
manipulated in a graded fashion. For example, in a pathway 
one of whose defining conditions is drug exposure, the drug 
exposure can be graded and the cellular constituents 
10 observed. If the defining conditions involve genetic 

manipulation, the genetic manipulation can be performed in a 
graded manner according to methods to be described in Section 
5.4. 

15 5.2 nBCOMPQ3INg PROS RESPONSES IN TO PATHWAY CONTRIBUTIONS 

This section presents, first, an overview of the methods 
of this invention, and second, an extended illustrative 
example of the principal of these methods. 

20 overview of the Methods of this Invention 

The methods of this invention determine the biological 
pathways through which a drug acts on a biological system by 
comparing measurements of changes in the biological state of 
a cell in response to graded drug exposure with measurements 

25 of changes in the biological state of biological pathways 
that are likely to be involved in the effects of the drug, 
the changes being in response to graded perturbations of 

these pathways. 

Aspects of the biological state of a cell, for example, 

30 the transcriptional state, the translational state or the 
activity state, are measured (as described in Section 5.6) in 
response to a plurality of strengths of drug exposure, 
preferably graded from drug absence to full drug effect. The 
collection of these measurements, optionally graphically 

35 presented, are called herein the "drug response". Pathway 
perturbations useful in this invention can be graded in 
varying strengths over a substantial part of the range from 
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complete pathway inhibition up to full pathway saturation. 
Aspects of the biological state of a cell which are similar 
to those measured in the drug response, e.g., the 
transcriptional state, are measured in response to a 
5 plurality of graded pathway perturbation strengths. The 
collection of these measurements, optionally graphically 
presented, are called herein the "pathway response 11 or 
"pathway signature". The pathway responses are preferably 
measured in experiments in which the activity or abundance of 

10 the leading protein or gene in the pathway is changed. 

Cellular constituents varying in the drug response are 
compared to cellular constituents varying in the pathway 
responses in order to find that biological pathway, or 
combination of biological pathways, which matches all or 

15 substantially all of the drug response. Substantially all of 
a drug response is matched by pathway responses when most of 
the cellular constituents varying in the drug response are 
found to vary in a similar fashion in one or more of the 
pathway responses. Preferably, at least 75% of the cellular 

20 constituents varying in the drug response can be matched, 

more preferably at least 90% can be so matched, and even more 
preferably at least 95% can be so matched. Cellular 
constituents vary in a similar fashion in two responses when 
both sets of data are likely to be the same in view of 

25 experimental error. 

In a preferred embodiment, comparison of a drug response 
with one or more pathway responses is performed by a method 
in which an objective measure of differences between the 
measured drug response and a model drug response is 

30 minimized. The model drug response is constructed by 

combining the pathway responses of those pathways considered 
likely to be involved in the effects of the drug. If a 
particular cellular constituent varies in only one pathway 
response, the variation of that cellular constituent in the 

35 model drug response is the variation in that one pathway 

response. If a particular cellular constituent varies in two 
or more pathway responses, the variation of that cellular 
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constituent in the model drug response is a combination of 
the variation in the pathway responses. This combination can 
be performed additively or by another numerical combination 
(see Section 5.3). Since the relation of the strength of the 
5 drug (described, for example, by the kinetic constants 
describing its actions) to the effectiveness of the graded 
pathway perturbation (described, for example, by arbitrary 
measures of a perturbation control parameter) is not known, 
an adjustable scaling is made between the intensity of the 

10 graded perturbations for each pathway response that are 
combined in the model drug response and the graded drug 
exposures. The variations of the cellular constituents are 
combined together into the model drug response with 
adjustable scalings. The adjustable scaling for one pathway 

15 is usually independent of the scalings for the other 
pathways . 

In one embodiment, the objective measure can be 
minimized by adjusting the scaling of each pathway response 
in the model drug response and/or by varying the number or 

20 identity of biological pathways combined in the model drug 
response. Varying the pathways combined in the model drug 
response can be simply achieved by setting the adjustable 
scalings in the biological pathways not desired so that no 
variation in the cellular constituents occurs. In a 

25 preferred embodiment, where the adjustable scalings are 
performed by linear transformation between the pathway 
perturbation parameters and the drug exposure, minimization 
of the objective measure can be performed by standard 
techniques of numerical analysis. See, e.g., Press et al., 

30 1996, Numerical Recipes in C , 2nd Ed. Cambridge Univ. Press, 
Ch. 10.; Branch et al., 1996, Matlab Optimizati on Toolbox 
User's Guide . Mathworks (Natick, MA). Also, the method of 
numerically combining variations of the same cellular 
constituent from different pathways can be varied. For 

35 example, multiplicative cross-product terms could be included 
which would represent, inter alia, multiplicative responses 
from multiple transcription factors coming together from 
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different convergent pathways to form a transcription 
complex. 

The pathways combined in the model drug response in 
order to represent measured drug response in advance of 
5 minimization of the objective function can be chosen in 
various ways. Most simply a large collection of biological 
pathways covering many cellular functions can be combined 
with independently adjustable scalings; the objective measure 
minimized; and the combination of biological pathways best 

10 representing the drug response determined* A "compendium" of 
biological pathways is a set of pathways which is 
substantially complete in the biological system used for the 
assay, or at least sufficiently complete to cover all 
pathways likely to be relevant for drug action. Preferably, 

15 the minimization is made more efficient if the collection of 
pathways can be narrowed to those likely to be involved in 
the action of the drug. Such narrowing can be predicated on, 
for example, prior knowledge of drug effect and biological 
pathway significance. 

20 More preferably, pathways are selected that originate at 

particular cellular constituents, and advantageously, sure 
also hierarchical (minimizing the muting effects of negative 
feedback loops or the amplifying effects of positive feedback 
loops) . Most preferably, the originating cellular 

25 constituents are likely to be targets of the drug of 

interest, usually functionally active proteins. For example, 
given a drug of interest and a selection of potential targets 
in the cell, first, the biological pathways originating at 
each of the potential targets can be measured (as previously 

30 described in Section 5.1). Second, these pathways can be 
combined with independent scaling factors, the objective 
measure minimized, and the combination of pathways best 
representing the drug's action determined. Thereby, along 
with determination of the actual pathways involved in drug 

35 action, the actual targets of the drug are also identified as 
the cellular constituents from which the actual pathways 
originate. 
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After the pathways involved in drug action are 
determined, they can be confirmed by the following additional 
methods of this invention. According to a first confirmation 
method, the significance of the pathways determined is 
5 decided based on statistical tests referencing the minimum 
value computed from the objective measure. One preferred 
test computes pathway representations as above with a 
plurality of randomizations of the drug response data in 
order to determine a distribution of minimum values of the 
10 objective measure. The statistical significance of the 

minimum value of the objective measure actually obtained from 
the un-randomized drug response data can be judged against 
this distribution. 

According to a second confirmation method, determined 
15 pathways can be confirmed by making measurements of a cell 
simultaneously both exposed to the drug and also having one 
or more of the determined pathways perturbed. By perturbing 
drug exposed cells (or drugging perturbed cells) , 
verification can be obtained that the pathway is in fact 
20 involved in the response of specific downstream genes and 
proteins. If the biological pathways perturbed are not 
involved in the action of the drug, the drug and the 
perturbations will produce independent, usually substantially 
additive, effects on the variation of cellular constituents. 
25 If the biological pathways perturbed are indeed involved in 
the action of the drug, the effects of the drug and the 
perturbations will not be independent. The effects will 
interfere and the variation of cellular constituents will 
saturate at values observed for either drug exposure or 
30 pathway perturbations alone. 

illustration of the Met hods of this Invention 

The following paragraphs generally illustrate several of 
the methods of this invention with respect to Fig. 1 and 
35 Figs. 2A-C. Fig. 1 illustrates drug D that may act on a cell 
through three potential pathways. Pathways 101 and 102 
originate with proteins PI and P2 and P3, respectively, and 
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ultimately influence the expression levels of the indicated 
genes, perhaps by influencing additional mediating cellular 
constituents. The details of pathway 103 are not 
illustrated. The methods of this invention determine which 
of these three pathways, alone or in some combination, 
explains the actual action of drug t> on the cell 

To make this determination, the methods of this 
invention attempt to represent drug D's action on the cell, 
that is its drug response, by a combination of the pathway 
responses of pathways 101, 102, and 103. This representation 
will be successful, and drug D's response will be adequately 
represented, for that combination of pathways which drug D 
actually affects. If the observed response of drug D can be 
represented adequately by only one of the pathway responses, 
that pathway is identified as being the only pathway of 
action for drug D. 

In the case of pathways 101 and 102 which originate at 
proteins PI and P2 and P3, respectively, the pathway 
responses can be directly determined by known perturbations 
of the abundance, or activity, or some other characteristic 
relevant for drug D's action, of the originating proteins. 
For example, application of variable perturbation 104 changes 
a relevant characteristic of protein PI, thereby influencing 
characteristics of the other cellular constituents in pathway 
101, for example, the expression levels of genes Gl, G2, and 
G3 . Perturbation 104 is capable of being applied in a graded 
fashion in order to generate pathway responses at a plurality 
of perturbation control values, from the native level of the 
characteristic of protein Pi perturbed to full saturation or 
inhibition of that characteristic. Similar known 
perturbations can be made to protein P2 and the expression 
levels of genes G4, G5, and G6 measured. 

Additionally, if the response of drug D on a cell can be 
represented as pathway responses generated by perturbing Pi 
or P2, one of skill on the art will appreciate that these PI 
or P2 are thereby identified as protein targets of drug D. 
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Fig. 2A illustrates a possible transcriptional response 
of a cell to drug D. The horizontal axis indexes the degree 
of drug exposure, for example , the concentration of the drug 
in the cell's environment, ranging from no exposure at the 
5 value 0 to saturating exposure at the value 5. The vertical 
axis indexes the logarithm of the ratio of the gene 
expression on exposure to drug D to the gene expression in 
the absence of drug D. Accordingly, the drug response curves 
all begin at 0 in the absence of drug D, corresponding to an 

10 expression ratio of 1. It is assumed for the purposes of 
this example that only genes Gl, G2, and G3 of a cell 
significantly respond to exposure to drug D with the response 
indicated by the labeled response curves. 

Although the gene response curves are presented for the 

15 purposes of illustration as continuous curves, in an actual 
experimentally determined drug response, expression ratios 
are measured for only a limited set of discrete levels of 
drug exposure . In an actual case, the graphical 
representation of a drug response would consist of expression 

20 ratios only at these discrete exposure levels. Preferably, 
the discrete drug exposure levels are chosen and positioned 
so that the steepest regions of the drug response curves are 
adequately sampled. Preferably, at least 5 and more 
preferably 10 or more exposure levels are positioned in these 

25 regions of the response curves, where the drug response 
varies from the unexposed level to the saturating level. 

Such response curves can be generated and measured by 
the methods of Sections 5.5. In particular, by employing 
technologies for gene expression analysis in concert with the 

30 genome sequence of the yeast S. cerevisiae, such response 
curves can be experimentally generated for nearly all of the 
genes in that yeast.. Although much of the description of 
this invention is directed to measurement and modeling of 
gene expression data, this invention is equally applicable to 

35 measurements of other aspects of the biological state of a 
cell, such as protein abundances or activities. 
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Fig. 2B illustrates a possible pathway response for 
pathway 101 (in Fig- 1), which originates with protein PI and 
involves the expression levels of genes Gl, G2, and G3, in 
response to perturbation 104 to originating protein Pi, The 
5 horizontal axis in this figure indexes the strength of 

perturbation 104 applied to PI, ranging from no perturbation 
of PI at the value 0 to saturating perturbation of PI at the 
value 5. Perturbation 104 can be either inhibiting or 
activating protein PI as the case may be. As set out in more 

10 detail in Section 5.4, such perturbation might be 

accomplished, inter alia, by transfection with varying 
amounts of a gene expressing PI in order to increase the 
abundance of PI, or by expression of PI under the control of 
a controllable promoter in turn controlled by a drug or small 

15 molecule, or by inhibition of PI activity by exposure to a 
different drug of specific known action against PI. 
Similarly to Fig. 2A, the vertical axis in Fig. 2B indexes 
the logarithm of the ratio of the gene expression on exposure 
to perturbation 104 to the gene expression in the absence of 

20 perturbation 104. The response of the expression levels of 
genes Gl, G2, and G3, which are components of pathway 101 
influenced by protein PI (whether directly or indirectly) , 
are illustrated by the labeled curves. 

Also similarly to Fig. 2A, although these pathway 

25 response curves are illustrated as continuous, in actual fact 
perturbation 104 to protein PI would be applied at a limited 
set of discrete values and the "curves" are actually 
expression ratio values at these discrete perturbation 
control parameter values. Also preferably, the discrete 

30 perturbation values are chosen and positioned so that the 
steepest regions of the pathway response curves are 
adequately sampled, .with at least 5 and more preferably 10 or 
more perturbation control parameter values positioned in the 
regions of the response curves where the responses vary from 

35 the unexposed level to the saturating level. 

The drug and pathway response curves in Figs. 2A and 2B 
illustrate the generally expected shape of such curves. This 
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expected shape includes a below threshold region at low drug 
exposure or perturbation control parameter over which there 
is effectively no response of the cellular constituents in 
the pathway. After this below threshold region, the drug or 
5 perturbation begins to be efficacious and the values of 
characteristics of the cellular constituents are perturbed. 
The curve of perturbed values is expected to usually have a 
monotonic increase or decrease toward an asymptotic level at 
saturation beyond which no further change is observed. The 

10 response curves terminate in this saturation region. 

In fact, more complicated, non-monotonic response curve 
shapes are possible and expected in some situations. For 
example, in the case where the drug or the perturbation has 
toxic effects, as toxicity sets in rising abundances of 

15 cellular constituents may start to fall and falling 

abundances may start to fall even faster. Also, nonlinear 
and feedback mechanisms known to be present in the biological 
systems may result in non-monotonic, multi-phasic responses. 
Such a response might first increase and then decrease with 

20 increasing perturbation amplitude or drug exposure. For 

example, a drug or a perturbation may act on certain cellular 
constituents through two pathways with different thresholds 
and with opposite effects to generate increasing then 
decreasing (or vice versa) responses. Although the methods 

25 of this invention are illustrated and primarily described 
with respect to monotonic response curves, such as 
illustrated Figs. 2A-B, as will be apparent to one of skill 
in the art from subsequent description, these methods are 
equally applicable to non-monotonic response curves. 

30 Having measured drug and pathway responses, the problem 

of determining the pathways by which drug D (of Fig. 1) acts 
on a cell requires matching the drug response as a 
combination of pathway responses. Fig. 2 A illustrates how 
the abundances of genes Gl, G2, G3, G4, G5, and G6 vary in 

35 the drug response of drug D. Since these same genes vary in 
the disjoint pathways originating at PI and P2, it can be 
determined according to the methods of this invention whether 
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either of these two pathway is actually involved in the 
response of drug D. 

According to the methods of this invention, these 
determinations are made by inquiring whether the pathway 
5 response curves of the pathways originating at PI and P2 can 
be transformed to match the drug response curves of Pig. 2A. 
Concerning only the pathway originating at protein PI, the 
determination of whether this pathway is actually involved in 
the action of drug D is met by attempting to transform the 
10 pathway response curves of this pathway, illustrated in Fig. 
2B, into the drug response curves for Gl, G2, and G3, 
illustrated in Fig. 2A. The drug response curves for G4, G5, 
and G6 need not be considered here because the pathway 
originating at PI does not affect these genes. 
15 The transformation of the pathway response curves of 

Fig. 2B into the drug response curves of Fig. 2A generally 
can have both a vertical and a horizontal component. No 
vertical transformation of these response curves is expected 
in this example. The amplitudes of both sets of response 
20 curves will be the same, since they both vary over the same 
range, from 0, in a resting state without perturbation or 
drug exposure, to saturation, in a state where both drug and 
the perturbation have maximally affected pathway 101. 
However, horizontal transformation is likely to be necessary. 
25 Because there is no reason for the values defining the 

perturbation control, such as the exposure value of a viral 
transfection vector expressing PI, or controllable promoter 
of PI expression, or another drug of specific known action on 
PI, to be the same as the values defining exposure to drug D 
30 under study, the drug and pathway response curves must be 
horizontally transformed in order to ascertain any possible 
match. Since the curves for Gl, G2, and G3 in Fig. 2B have 
the same general shape as the corresponding curves in Fig. 
2A, such a horizontally transformation is likely to be 
35 possible in this case. 

Finding a horizontal transformation, according to this 
invention, proceeds by parameterization of a class of 
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possible transformations. Then, optimum values of the 
parameters are sought that will make the pathway response 
explain the drug response as closely as possible. A 
preferable and simple class of transformations are linear 
5 scaling from values of the perturbation control parameter to 
values of the drug exposure, which are simply parameterized 
by the degree of stretch or shrinkage. Optimum values of the 
linear stretch can then be found by standard means, such as 
by minimization of an objective measure of the difference of 
10 the pathway and drug response curves. 

Fig. 2C sets forth an exemplary illustration of finding 
an optimum linear scaling parameter. The vertical axis of 
the graph of this figure indexes the average correlation 
value computed between the pathway response curves 61, G2, 
IS and G3 of Fig. 2B and the drug response curves Gl, G2, and 
G3, respectively, of Fig. 2A. It is well known in the art 
that, when two curves are identical, they will have a perfect 
correlation of 1.0. The horizontal axis indexes possible 
linear scaling parameters from 0 to 10. In this example, a 
20 perfect correlation value of 1.0 occurs at a scaling 

parameter of 2. The pathway response curves of Fig. 2B can 
be transformed with a linear scaling of 2 to fully match the 
drug response curves of Fig. 2A. Therefore, it can be 
concluded that the pathway originating at PI is one of the 
25 pathways of action of drug D. 

in order to determine whether the entire action of drug 
D can be explained by the pathways originating at Pi and P2, 
according to this invention the sum (the pathways are 
disjoint) of the both pathway responses (the response of the 
30 pathway originating at P2 is not illustrated) can be 

transformed into the response curves of all six genes to drug 



D. 



S.3 pMBODIMEMTS 

35 The analytic embodiments of the methods of this 

invention include, first, embodiments for representing drug 
response as a combination of pathway responses, and second, 
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embodiments for assessing the statistical significance and 
verifying the results of the representation found. 

Fig. 5 sets out a flow chart for a preferred embodiment 
of the methods of this invention. This embodiment determines 
5 a representative drug response data 510 for a particular drug 
in terms of pathway response data 511 for one or more 
pathways along with significance assessment and verification 
of the representation determined. 

In other embodiments of this invention, certain steps 

10 illustrated in Fig. 5 may be omitted or performed in orders 
other than as illustrated. For example, in certain 
embodiments candidate pathway selection , step 501, and 
scaling parameterization selection, step 502, can be 
performed once for the analysis of the response data from 

15 several, preferably related, drugs and need not be performed 
for each drug analysis separately. Also, in particular 
embodiments, pathway significance assignment and verification 
may not be performed, and accordingly, one or more of steps 
505 and 506, step 507, or step 508 may be omitted. 

20 

5.3.1 DRUG RESPONSE REPRESENTATION 

The representation of drug response data in terms of 
pathway response data preferably begins at step 501 with the 
selection of one or more candidate biological pathways with 

25 which to represent drug response data for a drug of interest. 
As discussed, the pathways preferably employed are those that 
originate at one or more cellular constituents, more 
preferably at constituents that are proteins likely to be 
targets of the drug of interest. Most preferably , the 

30 candidate pathways originate at single cellular constituents 
that are likely to be targets of the drug of interest. 

Where candidate drug targets are not known, single 
pathways can be chosen from among available pathways, perhaps 
stored in a compendium of pathways, and tested for 

35 significance in representing the drug response data according 
to the following steps illustrated in Fig. 5. Those pathways 
individually found to have significance in representing drug 



- 37 - 



WO 99/5S708 



PCT/US99/10050 



response data can then be employed combined, and the steps of 
Fig. 5 performed in order to determine the best pathway 
combination for representing drug action. A compendium of 
pathways is preferably substantially complete in the 
5 biological system used for the assay (in that it includes 
substantially all biological pathways in that system) , or at 
least includes substantially all pathways likely to be 
involved in drug action. 

Pathway response data are measured in step 511 for the 
10 pathways selected in step 501. In many cases, for example, 
where a pathway has been defined by measurement, response 
data will already have been measured for perturbations to the 
selected pathways. In other cases, this response data must 
be measured prior to the succeeding steps of this invention. 
15 As described above, response data for a pathway includes 

measurements of relative changes in relevant characteristics 
of the cellular constituents present in the pathway for a 
plurality of control levels of a perturbation to the pathway. 
For example, where the pathway is defined by gene expression 
20 levels originating at a protein constituent, the activity of 
the originating protein can be perturbed in a graded manner 
and the resulting ratios (or logarithms of these ratios) of 
native to perturbed gene expression levels are measured. The 
perturbation control levels are preferably chosen so that 
25 five or more, or more preferably ten or more, perturbation 
control levels are present in the region where the 
characteristics of the cellular constituents rapidly change 
from native levels to saturation levels. 

In the following, the variable "p" refers generally to 
30 perturbation control levels, and the variable "R" refers 

generally to the pathway response data. In detail, the l f th 
perturbation control level in the i'th biological pathway is 
referred to as "p*/'* The pathway response for the k*th 
cellular constituent in the i'th pathway is R^. Therefore, 
35 R^ k (p y ) is the response of the k'th cellular constituent in 
the i'th pathway at the I'th level of the perturbation 
control parameter. 
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Similarly, drug response data are obtained in step 510, 
and must be measured if not already available. As described 
above, these data are obtained by measuring changes in 
characteristics of cellular constituents at a plurality of 
5 levels of drug exposure (also called herein "levels of drug 
titration") . As with pathway response data, the drug 
exposure levels (or -drug titrations") are preferably chosen 
so that five or more, or more preferably ten or more, 
exposure values are present in the region where the 
10 characteristics of the cellular constituents rapidly change 
from native levels to saturation exposure levels. 

In the following, the variable "t" is used to refer 
generally to drug exposure (or "titration") levels, and the 
variable "D" refers generally to the drug response data. In 
15 detail, the l'th measured drug exposure level is referred to 
as "t,". The drug response for the k'th cellular constituent 
is D k . Therefore, D k (t,) is the drug response of the k'th 
cellular constituent at the l'th level of drug exposure. 

In the subsequent steps of these methods, in particular 
20 in step 504, values of the drug response data and the pathway 
response data may be needed at values of the drug exposure or 
perturbation control parameter which may not have been 
measured. This result follows from the fact that the 
measured drug exposure levels and pathway perturbation 
25 control parameters are not necessarily related. That is, for 
a particular 1, the variables t, and p u , for the various 
pathways, i, have no a priori relationship. Accordingly, it 
is necessary in step 502 to provide for interpolating of the 
various response data to obtain needed values. This 
30 interpolation method is preferably accomplished either by 
spline fitting or by model-fitting. The selection of an 
interpolation method and any necessary parameters are 
accomplished in step 502. 

In spline fitting, the drug and pathway response data 
35 are interpolated by summing products of an appropriate spline 
interpolation function, S, multiplied by the measured data 
values, as illustrated by the following equations. 
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R 1<k (u) =ES (U-p u ) Ri, k (Pi.i> 
D V (U) =£S (u-t x ) D k <t x > 



(1) 



5 The variable «u« refers to an arbitrary value of the drug 
exposure level or the perturbation control parameter at which 
the drug response data and the pathway response data, 
respectively, are to be evaluated. In general, S may be any 
smooth (at least piece-wise continuous) function of limited 
10 support having a width characteristic of the structure 

expected in the response functions. An exemplary width can 
be chosen to be the distance over which the response function 
being interpolated rises from 10% to 90% of its asymptotic 
value. Different S functions may be appropriate for the drug 
15 and the pathway response data, and even for the response data 
of different pathways. Exemplary S functions include linear 
and Gaussian interpolation. 

in model fitting, the drug and pathway responses are 
interpolated by approximating each by a single parameterized 
20 function. An exemplary model-fitting function appropriate 
for approximating transcriptional state data is the Hill 
function, which has adjustable parameters a, Uo. and n. 

H(u) = a(u/u ° )0 <2> 
H(U) 1 + (U/Uo)" 

25 

The adjustable parameters are selected independently for each 
cellular constituent of the drug response and for each 
cellular constituent of the pathway response. Preferably, 

30 the adjustable parameters are selected so that for each 

cellular constituent of each pathway response the sum of the 
squares of the distances of H(p u ) from R^CPy) is minimized, 
and so that for each cellular constituent of the drug 
response the sum of the squares of the distances of EM from 

35 D^t.) is minimized. This preferable parameter adjustment 
method is known in the art as a least squares fit of H() to 
F^O or to DkO. Other possible model functions are based on 
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polynomial fitting, for example by various known classes of 
polynomials . 

Model fitting with a Hill function is illustrated with 
respect to Figs. 3 and 4. As discussed, Fig. 3 illustrates 

5 an example of a pathway perturbed by methotrexate and 

identified by measurement. This figure illustrates the mRNA 
expression levels of 30 genes of the yeast S. cerevisiae 
that, of the approximately 6000 genes in the genome of this 
yeast, had the largest expression changes in response to six 

10 different exposure levels of methotrexate. Fig. 4 

illustrates a fit of the pathway response of one of these 
gene expression levels by a Hill function. In particular, 
the yeast gene YOL031C was fit by a Hill function with 
parameters n = 2, a - -0.61, and log, 0 (Uo) - 1.26 selected by 

15 the previously described least squares method. 

Since all of the 30 genes with largest responses behaved 
monotonically, i.e., none of the responses decreased 
significantly from its maximum amplitude (or increased 
significantly from its minimum amplitude) with increasing 

20 drug exposure, the Hill function is an appropriate model 
fitting function. For non-monotonic behavior it would not 

After selection of a response data interpolation method, 
the last step prior to drug response data fitting, step 503, 
25 is the selection of a scaling transformation, along with any 
necessary parameters, which will relate the biological 
pathway responses to the drug responses. In general, a 
scaling transformation may need to scale vertically as well 
as horizontally. Vertical scalings may be necessary to 

30 relate the various measurements of the relevant 

characteristics of each cellular constituent made in 
acquiring the response data. For example, such measurements 
might be of abundances of mRNA species or activities of 
proteins. Where these measurements are made in commensurate 

35 units, vertical scalings are needed merely to relate the 

various units of measurement. Alternatively, where both drug 
and pathway measurements are made across a range of 
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parameters from native levels to full saturation, as is 
preferable, these measurement* can be scaled, for example, by 
the saturation values. Such scaling obviates the need for 
any vertical scaling. In this case, for example, where 
5 pathway responses are interpolated by fitting with a Hill 
function, the value of the parameter -a" for all response 
data will be substantially equal to 1. In the following, it 
is assumed that any necessary vertical scaling by saturation 
values has been done and that all pathway data vary between 
10 common native level and saturation values. 

in general, horizontal scaling is expected to be 
necessary. As discussed above in Section 5.2, such scaling 
is necessary because values of the perturbation control 
parameters for the various candidate biological pathways are 
15 likely not to cause saturation responses at the same 
numerical perturbation control values nor at the same 
numerical value as the saturation response of the drug 
exposure. For example, the pathway perturbations may act 
according to such entirely different mechanisms as the 
20 titration of a viral transfection vector expressing a protein 
from which a pathway originates, or the control parameter of 
a controllable promoter controlling expression of an 
originating protein, or the exposure level of a drug of 
specific known action on an originating protein. The 
25 saturating control values of these mechanisms, and indeed 
their kinetic characteristics, are likely to be all 
unrelated. All of these mechanisms may be different from the 
action of the drug of interest. For example, where 
perturbation action on a cellular constituent from which a 
30 pathway originates can be modeled as a Hill function, there 
is no reason that the various "u^ parameters will be the 
same. 

The preferred horizontal scaling transformation is a 
linear transformation of the drug exposure level into 
35 corresponding perturbation control parameters. An exemplary 
expression of such a transformation follows. 
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Pi, i = «i fc J + Pi (3) 

Eqn. 3 provides the perturbation control value in the i*th 
pathway corresponding to the I'th drug exposure level. The 
5 linear scaling constants are CC| and ft. Each pathway is 

characterized by one set of scaling parameters. Generally, ft 
will be 0 since both drug exposure and perturbation control 
values begin with zero. In essence, oc; represents a ratio of 
the strengths of the particular pathway perturbation to the 
10 drug of interest. For example, where the response data can 
be modeled as Hill functions, oc s is the ratio of the Uo 
parameters of the drug of interest to that of the particular 
pathway. 

More general horizontal scaling transformations are 
15 possible characterized by additional parameters. Flexible 
scaling transformations are possible with a number of 
parameters small enough, even though nonlinear, to be 
usefully employed in the minimization procedure of step 504. 
Multiple scaling parameters for the i*th pathway are 
20 represented herein by Another example of a scaling 

transformation is a polynomial expansion generalizing the 
linear transformation of Eqn 3. A simple example of a more 
general scaling transformation is the previously described 
Hill function employed according to the following equation. 

Pl , . (4) 



Again, Eqn. 3 provides the perturbation control value in the 
i»th pathway corresponding to the l'th drug exposure level 
and is parameterized for each pathway by the three parameters 
a { , /i„ and iv The Hill function scaling is more general at 
least in that it reduces to a linear scaling when n is 1 and 
t, is much less than Mi- 

Step 504 is the central step of the methods of this 
invention in which the drug response is represented as a 
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combination of appropriately scaled pathway responses. The 
preferred representation of the drug response is as a scaled 
linear combination of the pathway responses . Such a 
representation is particularly useful when the cellular 

5 constituents affected by one pathway are either unaffected by 
the other pathways, or have linearly additive effects if 
multiple pathways converge on the same cellular constituent, 
such as an mRNA or protein abundance. Since the convergence 
or overlap of pathways is most likely far downstream of the 

XO primary targets, where the influences have branched out to 
include many genes, the effects of multiple pathways are more 
likely to accidentally act as independent and additive 
effects. If the effects converged through a new cellular 
constituent in the two pathways, independence and additivity 

15 is less likely. In such cases, multiplicative cross-product 
terms could be included which would represent, inter alia, 
multiplicative responses of a cellular constituent resulting 
from convergence of multiple pathways at that cellular 
constituent. Even in the latter case and in other cases 

20 where linear additivity does not hold, errors introduced by 
the linear additivity can be corrected with the techniques of 
Section 5.3.1. 

Therefore, preferably, the drug response data is 
represented in terms of the pathway response data according 

25 to the following equation. 

^(t x ) * £ Ri #k («i# tj) ; k = l,K; 1=1,L (5) 

Eqn. 5 represents the model drug response of the k f th 
cellular constituent at the l f th level of drug exposure in 

30 

terms of the sum of pathway responses for the k f th cellular 
constituent scaled according to the selected transformation 
parameterized by the o^. It is understood that in general, 
here and subsequently, that the F^O are interpolated 
35 according to the methods of step 502, since it is rarely the 
case that measurements will have been made at the 
perturbation control values given by the scaled drug exposure 
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levels. In cases where multiplicative cross-product terms 
are included (for example, in the cases previously described) 
Eqn. 5 would also include terms such as R i>k (ai, t 1 )R 1(k (a 1 t I .) . 
Sufficiently accurate solutions of this latter equation 
5 can be obtained by numerical approximation methods known in 
the art. These solutions determine the best scaling 
transformation so that the model drug response matches the 
drug response as closely as possible. Preferred methods 
provide a numerical indication (herein referred to as a 
10 "residual" } of the degree to which Eqn. 5 is not perfectly 

satisfied. According to a preferred method, pathway scaling 
parameters can be determined from the minimization of the 
related least squares approximation problem. 



15 In Eqn. 6, the inner sum of the R iK is over all interpolated 
pathway responses scaled according to the parameters a £ to 



for each biological pathway are generally a set of few 
parameters, such as from 1*5 parameters, defining the scaling 

20 transformation. The absolute square of the difference of 
this sum and the drug response at t 2 is in turn summed over 
all drug exposure levels, indexed by "1", and over all 
cellular constituents in the drug response or in the 
biological pathways, indexed by "k". The representation of 

25 „ the drug response in terms of the biological pathways is 
determined from the minimization of this latter sum with 
respect to the scaling transformation parameters for each 
pathway, the {a A }. The minimum value of this sum provides a 
numerical indication of the degree to which Eqn. 5 is 

30 satisfied, that is, the residual. 

For linear scale transformations, Eqn. 6 has the 
following simpler form. 



min { E£| D k (t x ) - £ R. rk {at^tj 




(6) 



correspond to the drug exposure level t 



The parameters a A 
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mindElD^tj) -U, l (« i t 1 )| ! ) (7) 



5 in Eqn. 7, each a ; is a single scaling constant for each 
biological pathway. Naturally, each a, depends on the units 
chosen for the drug exposure and those chosen for the 
perturbation control value as well as on the actual physical 
relation between the potency of the drug and the potency of 
10 the perturbation method. 

Minimization of least squares Eqns. 6 or 7 is performed 
using any of the many available numerical methods. See, 
e.g., Press et al., 1996, Numerical Recipes in C, 2nd Ed. 
Cambridge Univ. Press, Chs. 10, 14.; Branch et al., 1996, 
15 Wa f-.l a b QptlP*™^™ Top i^v Tier's Guide. Mathworks (Natick, 
MA) . A preferred method is the Levenberg-Marquandt method 
(described in Press at al., Section 14.4). Since there are K 
genes, and L level of drug exposure, Eqns. 6 or 7 represent 
KL individual equations. The number of unknowns is equal to 
20 the number of hypothesized pathways times the number of 
scaling parameters per pathway. In the case of linear 
scaling, the number of scaling parameters equals the number 
of pathways. Typically, the number KL is much larger than 
the number of scaling parameters so that the least squares 
25 problem is considerably over-determined, over-determination 
is advantageous in that it makes the solution robust, i.e., 
insensitive to measurement errors in individual cellular 
constituent responses. 

An alternative to the least-squares procedure outlined 
30 in Eqns. 6 and 7 for solving Eqn. 5 is to maximize the 

normalized correlation between the model drug response and 
the measured drug response. This procedure is closely 
related mathematically to the least squares procedure. 
According to this procedure the o, are determined from the 
35 solution to Eqn. 8. 
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max { 



( E (Ajj k ) 2 ECAj^/) 2 ) 1/2 C8) 

k k' 
{ ai l 



In this equation, p k (ai) is the correlation coefficient 
between the drug response data for the k*th cellular 
constituent and the model pathway response for the k'th 
10 cellular constituent. In detail, this correlation 
coefficient is given by Eqn. 9. 

ZD k (t x ) (£R imk {*it x )) 



n t 



15 



25 



In Eqn. 9, the inner sum (over i) represents the model drug 
response for the k f th cellular constituent. The product of 
the model and measured drug responses are summed over all 
levels of drug exposure, and the sum is normalized by the 
20 root-mean-square (also called herein "RMS") values of the 
these responses to give the correlation coefficients. 
Returning to Eqn, 8, the values of the correlation 
coefficient are preferably normalized by the amplitudes A^ 
and A^, which are the response amplitudes for the measured 
and model drug responses for the k'th cellular constituents. 
These amplitudes are chosen to be RMS values of the measured 
and model drug responses over all levels of drug exposure. 
This normalization gives greater weight to cellular 
constituents with larger amplitude responses, while ensuring 
30 that perfect correlation gives a value of unity. 

Alternatively and less preferably, the correlation 
coefficients can be unnormalized, in which case the 
amplitudes in Eqn. 8 are taken to be unity. Also, instead of 
the correlation coefficients, the negative of the correlation 
35 coefficients can be used, in which case the expression of 
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Egn . 8 is minimized (instead of maximized) to find the best 

scaling parameters. 

Bans. 8 and 9 can be solved by the methods described an 
the case of the least squares methods. It will be clear to 
5 those skilled in the art that the above fitting approach xs 
equivalent to minimizing the negative value of Eqn. 8. 

in both the least squares and the correlation methods, 
the summation of the pathway responses over the transformed 
drug exposure levels may lead to values outside of the 
XO measured interval of perturbation control parameters. This 
is because the scaling parameters, «„ can be substantially 
greater or less than unity, m order to avoid extrapolation 
of measured values, the sums in both cases (in Eqns. 6 and 8) 
are extended only over the interval in which there is 

15 measured data. 

When drug responses from two different drugs are being 
compared, the steps outlined above in this section can be 
performed to generate a correlation coefficient, or, 
alternatively, a least squares residual, which is a measure 

20 of similarity of the effects of the two drugs. In such an 
embodiment, only one response pathway is scaled to fit the 
drug response data. Thus, in this particular embodiment the 
response R of the second "perturbation" drug is compared to 
the response data of the first drug D according to Eqn. 5, 

25 above, where K=l. 

5.3.2 puTttgHY VE PTP1C&TION 
Following determination of a representation of the drug 
response as a combination of pathway responses, it is 
30 preferable, although optional, to assign a statistical 
significance to the pathway combination determined m step 
506 and to verify the pathways determined to be significant 
in step 507. 

35 statirtiral pignifjcance 

concerning step 506, the statistical significance of a 
pathway combination is determined by comparing the value of 
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the minimum residual determined from the solution of Eqn. 5 
to an expected probability distribution of residuals. The 
less likely the minimum residual is in terms of such a 
distribution, the more significant is the determined pathway 
combination. In the case of the correlation maximization 
method, the same methods can be applied to the maximum found 
in Eqn. 8. In particular, an expected distribution of this 
maximums can be found (as described below) , and the 
significance of the actually obtained maximum determined from 

this distribution. 

An expected probability distribution of residuals can be 
estimated by any method known in the art. Typically, this 
distribution is estimated analytically based on certain a 
priori assumptions concerning input probability 
distributions. Since such analytic estimation is difficult 
in this case, it is preferable to estimate the residual 
distribution by modeling based on a method described by 
Fisher. See, e.g., Conover, 2nd ed. 1980, Practical 
won r »r,,.n«t:ric statistics . John Wiley. This method provides 
an empirical residual distribution by taking permutations or 
random subsets of the input data. In detail, here the input 
can be permuted with respect to the levels of drug exposure. 

According to the preferred method, a residual 
distribution is constructed by repetitively solving Eqn. 5 
with randomized input data and accumulating the residuals to 
form the empirical residual distribution. Thereby, the 
constructed empirical residual distribution arises from 
random data that has the same population statistics as the 
actual data. In detail, first, efther the drug response date 
or the pathway response data (but not both) are randomized in 
step 505 with respect, to the drug exposure levels or the 
perturbation control parameters, respectively. This 
randomization transformation is represented by the following 
transformation. 

0 k (t 2 > - *VW do) 
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I„ Eon. 10. n represents a perturbation independently chosen 

fo/each cellular constituent. Either the drug response or 

the each pathway response (but not both) is rendered 

Moraine to Bqn- 10. Accordingly, the randomized dr», , - 

, . derived from the measured data by 

c pathway response data are derivea 

indent perturbations of the »easure»ent points Second. 

L, 5 is then solved by the chosen nu.eric.1 approxLatlon 

^echnigue in step 504 and the value of the resulting rescue! 

slved These steps are repeated for enough ™**«"~ *» 
„ construct a sufficiently significant expected probability 

distribution of residuals, m order to obtain confidence 

1 4 f. a p-value less than 0.01), 

levels of 99% or better (I.e., a r 

tte n more than 100 randomizations are needed. 

Having constructed the empirical residual dxstrxbutxon, 

1S in step 506, the actually determined residual is compared to 
L constructed distribution and its probability determxned 
in view of that distribution. This probability xs the 
significance assigned to the pathway. In other words the 
statistical significance of any fit of a combxnatxon of 

20 pathways to the drug response is given in the 

embodiment by the smallness of the probabxlxty value that 
randomized data are fit better by the assumed combxnatxon of 
nathwavs than the actual data- 

P I some cases, the pathway combination initially chosen 
25 in step 501 has adequate significance. For example thxs xs 
so if the pathway combination has at least the standard 95% 
probability threshold commonly used in medical " 
so, then this initial pathway combination can verxfxed xn 
step 507 and cellular components assigned to individual 
so biological pathways in step 508. In other cases an 

acceptable significance threshold will not be met at fxrst. 
If so, then, as indicated by arrow 512, it can be 
advantageous to return to step 501 and select a new set of 
caudal pathways in order to find a set meeting the chosen 
35 threshold standard of significance. 

Accordingly, the assigned significance provxdes an 
objective method for assigning significance values and 
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choosing between pathway combinations . This objective method 
of assigning significance allows meaningful identif icatxon of 
pathways from a large set of possible pathways likely to be 
involved in the action of a drug of interest, and provides an 
5 objective basis for halting the search for the additional 
pathways when the model drug response (possibly combining a 
plurality of pathways) attains sufficient objective 

significance. . 

in an alternative use of the significance as determined 

10 above, a single candidate pathway may be tested for 

significance according to two different approaches. In a 
first approach, the model drug response is taken to involve 
only that candidate pathway, and the pathway response data 
along that pathway are compared to the drug response data by 

15 correlation or least-squares residual (as described in 

Section 5.3.1). The significance of the fit, as determined 
by the randomization methods above, is compared to a 
threshold, such as the 95% threshold standard in the medical 
sciences, and the candidate pathway is taken to be a pathway 

20 of drug action if the significance is greater than that 

threshold. . 

in a second approach, the model drug response is assumed 
to involve multiple pathways, including the candidate pathway 
of interest. The pathway response data are then selectively 

25 randomized by randomizing only the pathway data for the 

candidate pathway according to Eon. 10. The significance of 
the model drug response against this selectively randomized 
data is assessed by the previous methods. If this latter 
significance is significantly less than the former 

30 significance of the actual data, then the candidate pathway 
is taken to have significantly improved the model drug 
response. In that case, the pathway is likely to be a 
pathway of action of the drug of interest. 

35 w-^fyJrm Pathway Combinations 

Concerning next step 507, the representation of a drug 
response in terms of pathway responses can be independently 
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verified by the preferred, but optional, steps described in 
this subsection. In the previous steps of this invention 
(steps 510 and 511) , a biological system was perturbed either 
by drug exposure or by perturbations of selected pathways, 
5 but not by both drug exposure and pathway perturbations. In 
steps 504 and 506, the results of drug exposure were fit by a 
combination of the results of selected pathway perturbations, 
and then the statistical significance of this fit was 
estimated. Now in step 507, simultaneous drug exposure and 
10 perturbation of the significant pathways determined in step 
504 are used to verify the that these pathways are indeed the 
actual pathways of drug action. 

Before describing the analytic details of pathway 
verification, the advantages of simultaneous drug exposure 
15 and pathway perturbation are exemplified with respect to the 
situation illustrated in Fig. 6. In Fig. 6., the expression 
of genes G k (for example, transcription state measurements of 
mRNA abundances) is affected by two pathways, one originating 
at protein PI and the other at protein Px. Drug D is assumed 
20 to act on genes G k either by inhibiting Pi or by inhibiting 
Px. If the inhibitory perturbations to the two pathways 
produce similar responses in the genes G,, then even if drug 
D acts only by inhibiting Px, its drug response will be well 
fit in step 504 by inhibitory perturbation 601 to the pathway 
25 originating at PI, and this pathway may be incorrectly 

identified as being the likely pathway of action of drug D. 
This error can be remedied by simultaneous exposure to drug D 
and inhibition of PI or of Px. Exposure to drug D and 
inhibition of PI will not result in a changed drug response, 
30 since the drug response is in fact mediated via Px. However, 
exposure to drug D and inhibition of Px will result in a 
changed drug response, since both the drug and the 
perturbation now act at Px. The different responses to 
simultaneous drug exposure and pathway perturbation in these 
35 two cases allow the correct pathway of action of drug D to be 
unambiguously identified. 
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The general description of verification step 507 begins, 
first with consideration of the case where only one pathway 
is involved in representing the drug response, and follows 
with consideration of the general case of multiple pathways. 
5 in the following, as previously, D k (t.) refers to the response 
of the k'th cellular constituent to the I'th level of drug 
exposure, and R^P,,) refers to the response of the k'th 
cellular constituent in the i'th pathway in response to the 
!.th level of the appropriate perturbation control parameter. 
XO Further, the variable DR refers to the -suits of the 

combined exposure of the biological system to both the drug 
and to a pathway perturbation. In detail, DR*( Py ,tJ refers 
to the response of the k'th cellular constituent in the i'th 
pathway in response to the I'th level of the appropriate 
15 perturbation control parameter and to the m'th level of drug 
exposure. 

in the case of a single pathway of drug action, if the 
drug indeed acts on that pathway then the combined response, 
DR, is given by the following. 
20 DR^iPi.i.tJ " Ri.^Pi.x + «i t - ) (11> 

where «, is the best scaling parameter determined for this 
pathway. A linear scaling is assumed here; adaptation to 
m ore general scaling transformations is apparent from the 
« preceding description. DR has the foregoing form **<~™«' ™ 
this case, both the drug and the perturbation act on the same 
constituents of the pathway, in particular on their 
originating constituents, and the response of the pathway is 
due to the summed effect. 
30 Th e behavior of Eqn. 11 is illustrated in Fig. 7A, 

where, for purposes of example only, D and R have been 
modeled by the Hill function. Characteristically, the 
function DR in this case saturates at substantially the same 
values for large drug exposure (drug -.titrations"), near 
35 as terisk 701, for large perturbation, near asterisk 702, and 
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for the combination of large drug exposure and large 
perturbations, near open circle 703. 

If instead, the drug acts on a different pathway, not 
on the i'th pathway, then the combined response, DR, xs gxven 
5 by the following. 

The response has this form in this case because the drug acts 
Inxy on cellular constituents outside of the i'th pathway. 

XO sin L the pathway perturbation is limited to cellular 

a nstituents in the i'th pathway, it acts independently of 
the drug. Consequently, the action of the drug and the 
Petition are dependent and their effects 
on cellular constituents. (The effects may be combxned 

XS Teeded according to the other combination functions dxscussed 

in Section 5-2) 

The behavior of F.gn. 12 (assuming a. equals 1) U 
illustrated in Pi,. 7B. where, for purposes of example only, 
D ,„ d » have again heen modeled by the Hill functron In 
» Lis case, the funotion DR saturates at substantially the 
72 values for large drug exposure <dru, .titrations.) near 
asteris* 70,, and for large perturbation, near '0 5 . 
But for the combination of large drug exposure and large 
P^turbations. this function reaches substantial y ^gher 
» near open circle ,06 than at either asterisk 704 or 

" 5 , .here only the drug exposure or the perturbation alone 

U nr.""' it is possible to distinguish the cases 
represented by Figs. 7A and 7B by performing -»•»-«» £ 
3. verification conditions where both the drug exposure and the 
pathway perturbation are simultaneously present. ««* 
exptriLnts are preferably at drug exposure and P-*-£- 
values represented by the open circles in Figs- 'A and 7B. 
and most preferably at open circles 703 and 706. Less 
» preferably, these experiments are performed at values ih the 
Interior of the surfaces illustrated in these figures, 
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especially in the region bounded by lines between asterisk 
7l and 702 and open circle 703 in Fig. 7A. and in the region 
bounded by lines between asterisks 703 and 704 and open 
Trcte 705 in Fig. 7B. It is also clear that it would not be 
5 possible to distinguish these cases solely by performing 
experiments in which only one of the drug exposure or 
perturbation control values are non-zero. The curves -g. 
7A between asterisk 710 and either asterisk 701 or asterisk 
702 are substantially the same as the curves in Fig. 7B 
10 between asterisk 711 and either asterisk 704 or asterisk 705. 
" xn sugary, the identification of the i-th pathway as 

the pathway of drug action is verified if experimental 
results more closely resemble Fig. 7A than Fig. 7B. 

considering the case of multiple pathway in general 
15 TMP „t ) refers to the total response of the k'th cellular 
^nst tult in response to the Lth level of the aPP-Priate 
perturbation control parameter in the i'th pathway and to the 
lT th le vel of drug exposure. TR is given by the following 
equation if the drug acts through the indicated pathways. 

20 TR k \p t .i. t.) - ? l*i.*<Pi.i- U = J * i '* {Pi ' 2 + ajtJ (13> 

TR is given by the following equation if the drug does not 
act through the indicated pathways. 

An objective choice between these two possibilities can 
be made in a manner similar to the statistical confidence 
estimation method described in the previous - bsectio - 

30 values for TMPu.tJ. the left-hand side of Eqns. 13 and 14, 
are experimentally determined for various Preferred 
verification conditions, and values for the right-hand side 
are computed from the measurements of the drug response and 
the pathway responses in steps 510 and 511 and from the 

35 deternination of the optimum scaling parameters in step 504. 
The residuals for these equations, that is the sum of the 
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o „f the differences of the left- and right-hand sides, 

lesser residual is the objective choice. 

The statistical significance of the residuals can be 
^Ited bv first, estimating a probability distribution of 
5 The estimated residual probability distribution 

Is Terlined by repeatedly randomizing the right hand sides 
of Eqns. 13 and 14 with respect to the perturbation control 
parameter index and the drug exposure index and then 
10 refuting the residuals. The statistical significance of 
L actual residuals are then determined with respect to this 
model probability distribution. 

Typically, only a small number of verification 
conditions are needed to confirm with significance the 
15 exilnce of a pathway which was determined to be significant 

^ St ln Tinal optional step 508, after drug responses have 
b een represented as a combination of pathway responses m 
step 504 and best-fit scaling parameters have been 
20 accordingly determined, each affected cellular constituent 
20 can be assigned to the pathway with which its drug response 
is most correlated. Optionally, the pathways have also been 
declared significant in step 506 based, for exam pie on a 
significance threshold, such as the standard 9 5% probabi ity 
25 threshold often used in the medical sciences 

cellular constituent its drug response, Djt.) , is correla 
with the individual response of that constituent in the 
response data of each pathway. 

p. = corx (^(tj) Ri.xteitJ > 
30 XD^t J R itk Cjfci) (15) 

= "(E WWVlRaitA,))')* 



„ in E gn. 15, p u is the correlation of the drug 
35 K-th cellular constituent with its response 

pathway. The K-th cellular constituent is assigned to the 
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i-th pathway where p u is larger than , u for all 1 not equal 
to i Similarly to the previous significance estimations, 
the statistical significance of this correlation can be 
determined by randomizing the drug response data xn Eqn. 15. 

5 r^^C 

5 .3.3 TMPT f EMEM T *TTOK BYS ^If p AND METHODS 

The analytic methods described in the previous 
subsections can preferably be implemented by use of the 
following computer systems and according to the following 

10 programs and methods. Fig. 9 illustrates an exemplary 

computer system suitable for implementation of the analytic 
methods of this invention, computer system 901 is 
illustrated as comprising internal components and being 
linked to external components. The internal components of 

15 this computer system include processor element 902 

interconnected with main memory 903. Tor example, computer 
system 901 can be an Intel Pentium»-based processor of 200 
Mhz or greater clock rate and with 32 MB or more of mam 
memory. 

20 The external components include mass storage 904. This 

m ass storage can be one or more hard disks (which are 
typically packaged together with the processor and memory) . 
such hard disks are typically of 1 GB or greater storage 
capacity. Other external components include user interface 

25 device 905, which can be a monitor and keyboard, together 
with pointing device 906, which can be a "mouse", or other 
graphic input devices (not illustrated) . Typically, computer 
system 901 is also linked to network link 907, which can be 
part of an Ethernet link to other local computer systems, 

30 remote computer systems, or wide area communication networks, 
such as the Internet. This network link allows computer 
system 901 to sharedata and processing tasks with other 

computer systems. 

Loaded into memory during operation of this system are 
35 several software components, which are both standard in the 
art and special to the instant invention. These software 
components collectively cause the computer system to function 
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according to the methods of this invention. These software 
components are typically stored on mass storage 904. 
software component 910 represents the operating system, which 
is responsible for managing computer system 901 and its 
5 network interconnections. This operating system can be of 
the Microsoft Windows™ family, such as Windows 95, Windows 
98 or Windows NT. Software component 911 represents common 
languages and functions conveniently present on this system 
to assist programs implementing the methods specific to this 
10 invention. Languages that can be used to program the 
analytic methods of this invention include C and C++, or, 
less preferably, JAVA® • Most preferably, the methods of thxs 
invention are programmed in mathematical software packages 
which allow symbolic entry of equations and high-level 
15 specification of processing, including algorithms to be used, 
thereby freeing a user of the need to procedurally program 
individual equations or algorithms. Such packages include 
Matlab from Mathworks (Natick, MA) , Mathematica from Wolfram 
Research (Champaign, Illinois), or S-Plus from Math Soft 

20 (Seattle, Washington). 

Accordingly, software components 912 and 913 represent 
the analytic methods of this invention as programmed xn a 
procedural language or symbolic package. Component 912 
represents programs implementing the methods for drug 
25 response representation described in Section 5.3.1, and 
component 913 represents programs implementing the methods 
for assessing the significance of a drug response 
representation described in Section 5.3.2. 

in an exemplary implementation, to practice the methods 
30 of this invention, a user first loads drug response data and 
pathway response data into computer system 901. These data 
can be directly entered by the user from monitor and keyboard 
905, or from other computer systems linked by network 
connection 907, or on removable storage media (not 
35 illustrated) . Next, the user causes execution of drug 
response representation software 912, after optionally 
supplying initial pathways of interest, followed by execution 
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of significance assessment software 913. Thereby, the user 
obtains a model drug response and its statistical 
significance. Finally, as described in Section 5.3.2, the 
user can iteratively improve on a first model drug response 
5 according to several alternatives by causing repetitive and 
iterative execution of the drug response representation 
software and the statistical significance assessment 
software. 

Alternative systems and methods for implementing the 
10 analytic methods of this invention will be apparent to one of 
skill in the art and are intended to be comprehended within 
the accompanying claims. In particular, the accompanying 
claims are intended to include the alternative program 
structures for implementing the methods of this invention 
1S that will be readily apparent to one of skill in the art. 

5.4 PATHWAY ppWTnBHATIOH METHODS 
Methods for targeted perturbation of biological pathways 
at various levels of a cell are increasingly widely known and 
20 applied in the art. Any such methods that are capable of 
specifically targeting and controllably modifying (e.g., 
either by a graded increase or activation or by a graded 
decrease or inhibition) specific cellular constituents (e.g., 
gene expression, RNA concentrations, protein abundances, 
25 protein activities, or so forth) can be employed in 

performing pathway perturbations, controllable modifications 
of cellular constituents consequentially controllably perturb 
pathways originating at the modified cellular constituents. 
Such pathways originating at specific cellular constituents 
30 are preferably employed to represent drug action m this 
invention. Preferable modification methods are capable of 
individually targeting each of a plurality of cellular 
constituents and most preferably a substantial fraction of 
such cellular constituents. 
3S The following methods are exemplary of those that can be 

used to modify cellular constituents and thereby to produce 
pathway perturbations which generate the pathway responses 
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„«.d in the steps of the methods of this invention as 

— i "** nti °" is aaaptable !! othet 

.Itnods for making controllable perturbations to pathways, 
^especially to cellular constituents fro. which pathways 

5 ° rlgi ^way perturbations are preferably made in cells of 
cell types derived fro, any organism for which genomic or 
jessed sequence information is available and for which 
ZZs are available that permit controllably »°<"«~«°" 
„ of the expression of specific genes. Genome sequencing is 
gently underway for several euKaryotic organisms, 
including humans, nematodes, Arab! dopsis , and flies. In a 
'referred embodiment, the invention is carried out using a 
yeast, with Saccbaromyces cerevisi.e most preferred because 
15 the sequence of the entire genome of a S. cerev.szae strain 
has "en determined. In addition, well-established methods 
are available for controllably modifying expression^ year 
genes. A preferred strain of yeast is a S. ««™^~» 
lor Which yeast genomic sequence is Known, such as sttam 
20 S288C or substantially isogeneic derivatives 

. a , Nature 369, 371-8 ,1994)= P.».*.S- 92.3809-13 ,1995), 
E .M B.O. J. 13:5,95-5809 (1994,. Science 
,1994), S.H.B.O. J. 15 = 2031-49 (1996), .11 of which are 
neorporated herein. However, other strains may be used - 
26 well, veast strains are available from American Type culture 
2 Action, KocKviile. MD 20852. standard ^ 
manipulating yeast are described in c. Kaiser, S 
. A. Mitchell. 1994, l lrt h nnr )n V-ast W"' » SPr i "' 1 
m r| — r-wtorv Man».X. Cold spring Harbor 

30 oratory Press. He. yorRj^nd Sherman et al.. 1986 MM*. 

Laboratory, cold spring Harbor. He. York both of which are 
incorporated by reference in their entirety and for .11 

35 ^"To" exemplary methods described in the following include 
use of titrable expression systems, use of transection or 
viral transduction systems, direct modifications to *HA 
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abundances or activities, direct modifications of protein 
abundances, and direct modification of protein activities 
including use of drugs (or chemical moieties in general) with 
specific known action. 

5 

T 4tT-*tabl «> KvprQSS^"" Systems 

Any of the several known titratable, or equivalents 
controllable, expression systems available foil use in the 
budding yeast Saccharomyces cerevisiae are adaptable to this 
10 invention (Mumberg et al., 1994, Regulatable promoter of 
Saccharomyces cerevisiae: comparison of transcriptional 
activity and their use for heterologous expression, Nucl. 
Acids Res. 22:5767-5768). Usually, gene expression is 
controlled by transcriptional controls, with the promoter of 
15 the gene to be controlled replaced on its chromosome by a 
controllable, exogenous promoter. The most commonly used 
controllable promoter in yeast is the GAL1 promoter (Johnston 
et al., 1984, Sequences that regulate the divergent GAL1- 
6 ALIO promoter in saccharomyces cerevisiae, Mol Cell. Biol.^ 
20 8-1440-1448). The GAL1 promoter is strongly repressed by the 
presence of glucose in the growth medium, and is gradually 
switched on in a graded manner to high levels of expression 
by the decreasing abundance of glucose and the presence of 
galactose. The GAL1 promoter usually allows a 5-100 fold 
25 range of expression control on a gene of interest 

Other frequently used promoter systems include the MET25 
. promoter (Kerjan et al., 1986, Nucleotide sequence of the 
Saccharomyces cerevisiae MET25 gene, Nucl. Acids. Res. 
14-7861-7871), which is induced by the absence of methionine 
30 in the growth medium, and the CUPl promoter, which is induced 
by copper (Mascorro-Gallardo et al., 1996, construction of a 
CUPl promoter-based vector to modulate gene expression in 
, Saccharomyces cerevisiae, Gene 172:169-170). All of these 

promoter systems are controllable in that gene expression can 
, as be incrementally controlled by incremental changes in the 
abundances of a controlling moiety in the growth medium. 
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. of the above listed expression systems 
One disadvantage of t (effected by, e.g., 

U that control of cer tain amino acids,, 

chants in carbon source, ^idogy which 

o£t en causes ^^^ession levels of other genes. A 
5 independently alter the P ^ 

recently developed et 19 „. 

deviates this problem to^ ^.^tabXe promoter 
A set of vectors with a ln sacch «<»yces 

*~ « ^rtt't -»X ^ Tet propter, adopted 
„ cerevislae. Yeast "V. (0o6se n et .1., »«, 

£ro m mammalian expression sys Jn ^^alian 

inscriptions! Z «■ 

cells, Proc. Nat. Acad. tetracycline or the 

ov the concentration of the ^ in the 

„ structurally ^nrTe ProX Educes a high level of 

— ^activity, 
expression, and tu ,„ re ssion of promoter activity 

aoxycycline causes -"-"'^Li can be achieved in the 
intermediate levels gene levels ot drug. 

„ steady state by adaitl °" that give maximal 

Furthermore, levels of doxycy ^ crograBS/ml) have no 

repression of promoter £=£J^ 1 „„ wild type yeast 
significant effect on the growth ^ ^ 

cells (oari et .1., • £or Boau iated gene 

„ t«^>~~*^J~ZZZ.. yeast 13 =e 3 7-34S,. 

expression in ~**™*ZZZ ~- o£ '"r^" 5 • 

m mammalian cells, sever ^ creating 

expression of genes are avaUable <SP ^ 12 . 1S1 . 1B7) . 

conditional "* M ^'"jT^ m is widely used, both in 
30 as mentioned ^Z^T^. in «bich addition of 
its original font, the forw d in ^ newer 

doX ycycline 6timUlat6S 

-reverse" system, in which doxy y ^ ^ ^ ^ 

transcription (Gossen et .1., < ^ Aci<Js . ReB . 

35 USA 89:5547-5551; Hoff^net ^ ^ ^ Acad . Sci. 

25:1078-1079; Hofmann et al. , x Qf virolog y 
USA 83:5185-5190; Paulus et al.. 
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_ ftnlv US ed controllable promoter 
• MO !:: r n ^ is the ecdyeone-inducible system 
system in mammalian ~™ . (So et al ., 19 96. Ecdysone- 

developed by Evans and colleag > transgenic 

inducible — — » l ; n ol l ~ ~« U . where 
, Proc. -at- Ac,*- -1- ~ o£ „ risterone ad ded to 

expression is "» expression can be modulated 

the cultured ce Is (CID) system 

^ing tbe """^"^^U a „ d collogues (Belsha. et 
developed by schrexber. crabtr ^ ^ ^ 

„ .1., 1»»6. controlling P'°^ n ^ lnduces 

iocalisation »ith a synthet „ x, ^ ^ os& 

h eterodi»erl,atxon o£ conditional mutations 

M! 4604-4607; spencer. 19,6, Cre* 9 ^ 
i„ mammals. Trends Genet. 12.1" > ^ 

15 yeast, m this system, the gene of interest > 

control of tbe CXD-responsive promoter and tran ^ 

cells expressing ™J™Z\T^ >*~ — 

of a DMA-binding domain fused t « ans criptional 

Th e other hybrid protein ""^"VcxP inducing 

::rtrbt™i;— ^-j^r^r^— 

- - r "the mammalian £~ 

— runTertr^riVthe controllable 

s^-rr ^esL — rr^:o a — 

an antibiotic resistanc< , gene U transfe ^ 

" amMUan ""'/drug r^tTolcnies ere selected and 
the genome, and <™£^»-**> o£ the regulated gene, 
screened for appropriate expr inserted into an 

A1 ternative ly , the 

35 epical plas»i d ^^^^^/virus necessary for 
contains components of the Epstein 

plasmid replication. 
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— ^-jr^TLr^t- or delated. 

5 which the endogenous h " ° ar e uell known to 

Methods for producing such knock o ^ ^ ^ 

those of skill in the art, ^ proc . 
..velopment -«^' 0 ^„ ; aamires-solis * al 
„atl. Acad. ' ^ Thomas at 1.. »<". 

„ 1993, Methods Enzymol. 225.855 

Cell 51!503-512. 

^^^^^^o t target genes can 
Transection or viral m biological pathways 

15 introduce controllers 'f^"^ . ctiM1 or transduction 
in mammalian cells, ^^lls «-t do not naturally 
of a target gene can be use I « ^ 
express the target gene <* norMlly expr essin, 

cells can he •~~*"£^ B . can he specifically 
20 the target gene o, ^ gene o£ int ercst can he 
mutated in the ^"^J^ expression plasmlds, for 
cloned into one « - (invitrogen, Inc.) or 

example, the P cD»A3 . +/ ^ ^ non -expressing 

retroviral vectors, and oells expr essing the 

25 host cells. " setection for a drug 

target gene may he "Repression vector. The 

resistance marker encodec by the^ * telated « the 

xevel of gene transcr !P l s ^ tte ef£TCt s of varying 
transfection dosage. In this " y ' >ted . 
„ levels of the target gene may £ in ^ „ ^ 

A particular £ P-tein tyrosine 

search for drugs that target t receptor 
kinase, Ick, a key component of tn InvolveBent of the 

activation pathway t*" 4 "™ * *^ T cell signaling and 
« protein tyrosine kinase p« <lcW „,„,.„.,. Inhibitors 
thymocyte development, Mv . I Ual immunosuppressive 

of this enzyme are of interest 
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„- cC ovcry of a Novel, Potent, and src 
drug s <Han*e 3H, ^ * inhibi tor, a. Biol Chem 

family-elective tyrosine ^ Jurkat T cell line 

271(2) :695-701). A „ ot expresB 1* Kinase (Straus 

(J caMl) U available that does n of the IcK 

5 it al., G 7 tiC ^:L d uction though the T cell 

tyrosine kinase in signal tran introduction 

^en receptor, or transduction 

of the loX gene xnto JCaMi y of , cell 

permits specific ~ ^ * 

10 activation regulated by the 1 ^ of 

transf ection or ^ sd ~ me thod is generally 
perturbation, is dose reia . expression or 

protein abundances in cells n 
15 to be perturbed. 

■ „ Ml lbjmdanpes_2I_6cil!£i£iSS 
Methods of nodding "» » r ibozy»es, antleense 

gently ^oa " al., ««. — — « 41 

20 species, and ^ or expoBu re „ f a cell to 

entXTrtsTnt^llable perturbation - - 

abundances. capable of catalyzing RH* 

Rib ozy»es are >» v^ch are oj^ 236!1 „ 2 . 1539J per 

25 cleavage reactions. (Cecn. • p^Ushed October 4, 

In ternational Publication WO SO/1^ ^ 

1990 ; server et .1.. 1«° * rlboI!vll „ can be designed to 
-Hairpin- and "^^J^J^t *n. «— 

specifically cleave a P*" 1 ^",, short ,„lecules with 
„ been established for the —in, other P*» 

ribozyme activity, which ar * mA oa „ be 

.oleoulee in a highly (H aseloff et .1.. 

targeted to virtually all Kinds 

"... "^•^•U.. 23 9 :2SS-2SS> 

35 223:228-230; Koixu»i et al., ■ in auoing 

R ibozy»e .ethcds inv°lve e^ n^ce^ ^ 
expression in a cell, etc. 
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molecules. (Grass! ana Marini, 1996, Annals of Medicine 28: 
L-510; Gibson, 1996, Cancer and Metastasis Reviews 15. 287 

29?> 'Ribozymes can be routinely expressed in vivo in 

w ~,+ a ivtically effective in cleaving 
5 sufficient number to be catalyticaliy err 

«.i »nd thereby modifying mRNA abundances in a cell. 

^ ~e mediated destruction of RNA m 
vivo The EMBO 8:3861-3866). In particular, a ribozyme 
coding DMA sequence, designed according to the previous . rules 

XO Ind synthesized, for example, by standard phosphoramidite 
chemistry, can be ligated into a restriction 
the anticodon stem and loop of a gene encoding a tRNA which 
can then be transformed into and expressed m a cell of 
interest by methods routine in the art. Preferably, an 
in Y . „ _ a glucocorticoid or a tetracycline 

15 inducible promoter (e.g., a 

iL«rt:i is also introduced into this construct so 
response element) is also in cont rolled. tDNi 

that ribozyme expression can be selectively 
genes (i.e., genes encoding tRNAs) are useful in this 
application because of their small size, high rate of 

to cleave virtually any mRNA sequence, and a cell can^ be 
routinely transformed with DMA coding 
sequences such that a controllable and catalyticaliy 
25 effective amount of the ribozyme is expressed -or * 
the abundance of virtually any RNA species m a cell can 

PertU In e Lther embodiment, activity of a target RNA 
(preferable mRNA) species, specifically its rate of 

30 translation, can be controllably inhibited by the 

controllable application of antisense nucleic acids. An 
..antisense.. nucleic acid as used herein refers t a nucleic 
acid capable of hybridizing to a sequence-specific (e.g., 
non-Doly A) portion of the target RNA, for example its 

3S Ration Initiation region, by virtue of -e -quence 
complementarity to a coding and/or non-coding region. The 
antisense nucleic acids of the invention can be 
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oligonucleotides that are double-stranded or single-stranded, 
RNA or DNA or a modification or derivative thereof, which can 
be directly administered in a controllable manner to a cell 
or which can be produced intracellular^ by transcription of 
5 exogenous, introduced sequences in controllable quantities 
sufficient to perturb translation of the target RNA. 

Preferably, antisense nucleic acids are of at least six 
nucleotides and are preferably oligonucleotides (ranging from 
6 to about 200 oligonucleotides). In specific aspects, the 
10 oligonucleotide is at least 10 nucleotides, at least 15 
nuclebtides, at least 100 nucleotides, or at least 200 
nucleotides. The oligonucleotides can be DNA or RNA or 
chimeric mixtures or derivatives or modified versions 
thereof, single-stranded or double-stranded. The 
15 oligonucleotide can be modified at the base moiety, sugar 
moiety, or phosphate backbone. The oligonucleotide may 
include other appending groups such as peptides, or agents 
facilitating transport across the cell membrane (see, e.g., 
Letsinger et al., 1989, Proc. Natl. Acad. Sci. U.S.A. 86: 
20 6553-6556; Lemaitre et al., 1987, Proc. Natl. Acad. Sci. 84: 
648-652; PCT Publication No. WO 88/09810, published December 
15, 1988), hybridization-triggered cleavage agents (see, 
e.g., Krol et al., 1988, BioTechniques 6: 958-976) or 
intercalating agents (see, e.g., Zon, 1988, Pharm. Res. 5: 
25 539-549) . 

in a preferred aspect of the invention, an antisense 
oligonucleotide is provided, preferably as single-stranded 
DNA. The oligonucleotide may be modified at any position on 
its structure with constituents generally known in the art. 
30 The antisense oligonucleotides may comprise at least one 

modified base moiety which is selected from the group 
including but not limited to 5-f luorouracil, 5-bromouracil, 
5-chlorouracil, 5-iodouracil , hypoxanthine, xanthine, 

4- acetylcytosine, 5- (carboxyhydroxylmethyl) uracil, 
35 5-carboxymethylaminomethyl-2-thiouridine, 

5- carboxymethylaminomethyluracil, dihydrouracil, beta-D- 
galactosylqueosine, inosine, N6-isopentenyladenine, 
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5 .etnyx ,„,„,!„. 5-^«.thoitycarboxyinethyluracil, 

5 beta-D-nannosylqueosine, » > ,...„,<_- 

5 - m ethoxyuracil, ^.ethylthio-W-isopentenyxadenxne, 

LacU^lyacetic acid <v, , wybutoxoeine, pseudouracxl. 

Cosine, ,-thxocytosine, 5 -.ethyl- 2 -thxouracxl 

^thiouracil. 4-thiouracil , 5-methyluracxl. uracxl- 
„ 5-o*y.c.tic acid .ethylester. uraoil- 5 -o*yacetic acid (v,, 

LeLyl-3-thiouracil. 3- ( 3- m ino- 3 -B-2-c»rbc Xy prcpyl) 

uracxl. <acp3)v, and 2,6-dia.inopurine. 

a ancLer embodx.«nt, the oligonucleotide comprises at 
Xeast one modified sugar moiety selected fro. the group 
xs including, hut not limited to, arabinose, 2 -f luoroarabxnose, 

WlUl r;eranoth:r^odi.e„t. the oligonucleotide copses 
«t least one modified phosphate backbone selected 
"up consisting of a phosphorothioate. a phosphorodxthxoate, 
J0 , phosphor.»idothioate, a phosphoramidate, a 

phosphordiamidate. a methylphosphonate. an alKyl . 
phosphotriester. and a for.ac.tal or analog *"ereof 

In yet another e»>bodi.ent, the olxgonucleotxde xs a 
. •> An a-anomeric oligonucleotide 

„ foT Z2ZZZZLZ ~>» -» — ntary 
i„™nich. contrary to the usual 6-units. the strands run 
Mallei to each other (Gautier et .1.. »ucl. Acxds 

Pps 15: 6625-6641)* 

" ^ hybridation-triggered 

"^S: ^ -ids of the invention comprise, 
sequence complementary to at least a portion of a target «Ha 
However absolute complementarity, .although 
" ~- .Tnot'reguired. . seguence -complementary to at 

leasH Portion of an »* referred to herein, means a 
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sequence having sufficient complementary to be 
hybridize with the KNA, forming a stable duplex xn the case 
of oouble-stranded antisense nucleic acids, a single strand 
of the duplex DNA may thus be tested, or triplex formatxon 
5 1 be assayed. The ability to hybrids will depend on both 
Te degree of complementarity and the length of the antxsense 
nucleic acid. Generally, the longer the hybridizing nuclexc 
acid, the more base mismatches with a target RNA it may 
contain and still form a stable duplex (or triplex, as the 
10 case may be) . One skilled in the art can ascertain a 

TolerLL degree of mismatch by use of standard procedures to 
determine the melting point of the hybridized complex. The 
amount of antisense nucleic acid that will be ef fectxve xn 
the inhibiting translation of the target RNA can be 
15 determined by standard assay techniques. 

Oligonucleotides of the invention may be synthesxzed by 
standard methods known in the art, e.g. by use of an 
Elated DNA synthesizer (such as are commercially avaxlable 
from Biosearch, Applied Biosystems, etc.). As 
20 phosphorothioate oligonucleotides may be synthesized by the 
method of Stein et al. (1988, Nucl. Acids Res. 16: 3209), 
lethylphosphonate oligonucleotides can be prepared by us of 
controlled pore glass polymer supports (Sarin et al 1988, 
Proc. Natl. Acad. Sci. U.S.A. 85: 7448-7451), etc. In 
25 another embodiment, the oligonucleotide xs a 2'-°" 

^ribonucleotide (Inoue et al., 1987, Nucl. Acids Res 
15: 6131-6148), or a chimeric RNA-DNA analog (Inoue et al., 
1987, FEBS Lett. 215: 327-330). 

The synthesized antisense oligonucleotides can then be 
30 administered to a cell in a controlled manner For example, 
the antisense oligonucleotides can be placed in the growth 
Environment of the cell at controlled levels where they may 
- be taken up by the cell. The uptake of the antxsense 

be taken up y assist ed by use of methods well known 

oligonucleotides can be assisted Dy 

- 35 in the art. , , 

in an alternative embodiment, the antisense nuclexc 

acids of the invention are controllably expressed 
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intracellulars by transcription fro* an exogenous seance. 
For example, a vector can be introduced xn vivo such that xt 
is taken up by a cell, within which cell the vector or a 
portion thereof is transcribed, producing an antisense 
5 nucleic acid (RNA) of the invention. Such a 

contain a sequence encoding the antisense nuclexc acid. Such 
a vector can remain episomal or become chromosomally 
integrated, as long as it can be transcribed to produce the 
desired antisense RNA. such vectors can be constructed by 
XO recombinant DNA technology methods standard in ^ «*• 
vectors can be plasmid, viral, or others known in the art, 
used for replication and expression in mammalian cells. 
Expression of the sequences encoding the antisense RNAs can 
be by any promoter known in the art to act in a cell of 
15 interest! Such promoters can be inducible or constitutxve. 
" Tos ; drably, promoters are controllable or 

the administration of an exogenous moiety xn order to achxeve 
controlled expression of the antisense olxgonucleotxde. Such 
controllable promoters include the Tet promoter. Less 
20 preferably usable promoters for mammalian cells xnclude, but 
are not limited to: the SV40 early promoter region (Bemoist 
and Chambon, 1981, Nature 290: 304-310), the promoter 
contained in the 3' long terminal repeat of Rous -rcoma 
virus (Vamamoto et al., 1980, Cell 22: 787-797) the herpes 
25 thymidine kinase promoter (Wagner et al., 1981, Proc. Natl. 
TZ. Sci. U.S.A. 78: 1441-1445), the regulatory sequences of 
Te metallothionein gene (Brinster et al., 1982, Nature 296: 

"""Iherefore, antisense nucleic acids can be routinely 
30 designed to target virtually any mRNA sequence, and a cell 
can I routinely transformed with or exposed to 
coding for such antisense sequences such that an effective 
and controllable amount of the antisense nuclexc acxd xs 
expressed. Accordingly the translation of virtually any RNA 
35 species in a cell can be controllably perturbed. 

Finally, in a further embodiment, RNA aptamers can be 
introduced into or expressed in a cell. RNA aptamers are 
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specific M ligands « P™*^- -* - £ « Zt^r 
specifically inhibit their translation. 
. nf H-fl'fY^T" "'■"toin Bhundnnces 

S HStll ^c7of modifying protein abundances include xnter 
, • those altering protein degradation rates and those 
""in, antildi- (wnich bind to proteins affecting abundances 
" acttvittet of native target protein species, increase 
„ (or decreasing, the degradation rates of a protein species 
decreases (or increases, the abundance of that specxes. 
ZZ for controllabiy increasing the de^adation n*. of a 
target protein in response to elevated te^erature and/or 
exposure to a particular drug, which are known m the art. 
15 can Z employed in this invention. For example, one such 
" ZL employs a heat-inducihl. or drug — 

aegron, which is an H-terminal protein fragment ^ <™"~" 
^degradation signal promoting rapid protein dagr.dat.on at a 
nigher temperature (..,.. 3" e) and which » hidden to 
- /i»or.dation at a lower temperature (e.g., 

„ prevent rap.d d^adatxon ^aW). Such an 

23° C) (Dolmen et. al, is*** Sl,ie " 

• »w. mure* a variant of murxne 
exemplary degron is Arg-DHFR , a vara. 

dihydrofolate reductase in which the N-termxnal Val xs 

«* « «* «" Pr ° ^ POSiti ° n " 18 ^ f orT 
25 « According to this method, for example, a gene for a 

tlrget protl! P. *- ^ standard gene targetxng 

^l^own In the art <y*l* et al., 1**. 

* 4.w ^ w H Freeman and Co., New xorie, 

?£S!£Z£?&\ gen. ceding for the fusion protein 

uhigultin is rapidly cleaved after 

H-terminal deoron. « lower temperatures lysines ^l»texT»l 
to arg-DHT*- are not exposed, ubiguitlnation of 
protein does not occur, degradation is slow, and active 
35 target protein levels are high. « higher ^peratures^n 
the absence of methotrexate,, lysines 

,re exposed, ubiguitination of the fusion protein occurs, 
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degradation is rapid, and active target protein levels are 

Teat activation of degradation is controllably blocked 
Exposure methotrexate. This method is adaptable to other 
N-terlinal degrons which are responsive to other inducing 
5 factors, such as drugs and temperature changes. 

Target protein abundances and also, directly or 
indirectly, their activities can also be decreased by 
(neutralizing) antibodies. By providing for 
exposure to such antibodies, protein abundances activities 
xo Z be controllably modified. For example, antibodies to 
suitable epitopes on protein surfaces may decrease the 
abundance, and thereby indirectly decrease the activity of 
the wild-type active form of a target protein by aggregating 
active forms into complexes with less or minimal activity as 
15 compared to the wild-type unaggregated wild-type form. 
Alternately, antibodies may directly decrease protein 
acuity by, e.g., interacting directly with active sites or 
by blocking access of substrates to active sites, 
conversely, in certain cases, (activating) antibodies may 
20 also interact with proteins and their active 

increase resulting activity. In either case, antibodies (of 
the various types to be described) can be raised against 
S pecif ic protein species (by the methods to be 
tLir effects screened. The effects of the antibodies can be 
25 Tss yed and suitable antibodies selected that raise or lower 
the target protein species concentration and/or activity. 
Such assays involve introducing antibodies into a cell (see 
below) , and assaying the concentration of the 
amount or activities of the target protein by standard means 
30 (such as immunoassays) known in the art. The net activity of 
L wild-type form can be assayed by assay means appropriate 
to the known activity of the target protein. 

Antibodies can be introduced into cells in numerous 

fashions, including, for example, microinjection of 
rasnioitb, ^ Immunology Today 

35 antibodies into a cell (Morgan et al., 1988, Immun gy 
9:84-86) or transforming hybridoma mRNA encoding a desired 
antibody into a cell (Burke et al., 1984, Cell 36:847-858). 
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I„ a further technique, recombinant antibodies can be 
engineering and ectopically expressed in a wide variety of 
non-lymphoid cell types to bind to target proteins as well as 
to block target protein activities (Biocca et al, 1995, 
5 Trends in Cell Biology 5:248-252). Preferably, expression of 
the antibody is under control of a controllable promoter, 
such as the Tet promoter. A first step is the selection of a 
particular monoclonal antibody with appropriate specificity 
to the target protein (see below) . Then sequences encoding 
.10 the variable regions of the selected antibody can be cloned 
into various engineered antibody formats, including, for 
example, whole antibody, Fab fragments, Fv fragments, single 
chain Fv fragments <V„ and V L regions united by a peptide 
linker) ("ScFv" fragments) , diabodies (two associated ScFv 
15 fragments with different specificities) , and so forth (Hayden 
et al., 1997, Current Opinion in Immunology 9:210-212). 
Intracellular^ expressed antibodies of the various formats 
can be targeted into cellular compartments (e.g., the 
cytoplasm, the nucleus, the mitochondria, etc.) by expressing 
20 them as fusions with the various known intracellular leader 
sequences (Bradbury et al., 1995, ftn Hhod y Knqineerinq (vol. 
2) (Borrebaeck ed.), PP 295-361, IRL Press). In particular, 
the ScFv format appears to be particularly suitable for 
cytoplasmic targeting. 
25 Antibody types include, but are not limited to, 

polyclonal, monoclonal, chimeric, single chain, Fab 
fragments, and an Fab expression library. Various procedures 
known in the art may be used for the production of polyclonal 
antibodies to a target protein. For production of the 
30 antibody, various host animals can be immunized by injection 
with the target protein, such host animals include, but are 
not limited to, rabbits, mice, rats, etc. Various adjuvants 
can be used to increase the immunological response, depending 
on the host species, and include, but are not limited to, 
35 Freund's (complete and incomplete), mineral gels such as 
aluminum hydroxide, surface active substances such as 
lysolecithin, pluronic polyols, polyanions, peptides, oil 
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emulsions, dinitrophenol. an, P— ^» 
adjuvants such as bacillus calmette-Guerm (BCG) and 

""TZZZZZ —1 antibodies — tte 

. , . . /i ate. Mature 256: 495— 45*/) » w*b 
bv Kohler and Milstem (1975, Nature ^ 
by Roni« R _ ce ii hybridoma technique 

» T.r»« e ^ ^ ^ «• - - *- 

SSLT^T- produce human monoclonal antibodies 

^ L i—;! Helena! antibodies can * 

" ^Z l lT-Z animals utilising recent technology 
^,0890/02545). According to the invention. hu»an 
^blies may be used and can be obtained « ^ ^ 

hybridomas <Cote et al.. «... «~. ^ti^v virus 
9026-2030) , or by transforming human B cells wi« 

oroduction of -chi»eric antibodies" (Horrrson et »!.. 1»M, 

rrtti. xcad. .I. « si. «^»' i xsLr» ;• 

ific fir Le target protein together with genes from a 
Zl tlL^olecule of appropriate biological act^ty 
,. can be used, such antibodies are within the -ope of th!s 

inVen MdUionally, where monoclonal antibodies are 
advantageous, they - be actively selected from *rge 

-Tvr":rrr, t ~;reorri:, p u 

have been expressed on the surface or i 
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creating a "single pot- in vitro immune systeia of antxbodies 
available for the selection of monoclonal antibodies 
(Griffiths et al., 1994, EMBO J. 13:3245-3260). Selection of 
antibodies from such libraries can be done by techniques 
5 known in the art, including contacting the phage to 

immobilized target protein, selecting and cloning phage bound 
to the target, and subcloning the sequences encoding the 
antibody variable regions into an appropriate vector 
expressing a desired antibody format. 
10 According to the invention, techniques described for the 

production of single chain antibodies (U.S. patent 4,946,778) 
can be adapted to produce single chain antibodies specific to 
the target protein. An additional embodiment of the 
invention utilizes the techniques described for the 
15 construction of Fab expression libraries (Huse et al. , 1989, 
Science 246: 1275-1281) to allow rapid and easy 
identification of monoclonal Fab fragments with the desired 
specificity for the target protein. 

Antibody fragments that contain the idiotypes of the 
20 target protein can be generated by techniques known in the 
art. For example, such fragments include, but are not 
limited to: the F(ab') 2 fragment which can be produced by 
pepsin digestion of the antibody molecule; the Fab' fragments 
that can be generated by reducing the disulfide bridges of 
25 the F(ab'), fragment, the Fab fragments that can be generated 
by treating the antibody molecule with papain and a reducing 
agent, and Fv fragments. 

in the production of antibodies, screening for the 
desired antibody can be accomplished by techniques known in 
30 the art, e.g., ELISA (enzyme- linked immunosorbent assay). To 
select antibodies specific to a target protein, one may assay 
generated hybridomas- or a phage display antibody library for 
an antibody that binds to the target protein. 



35 
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Mo fhoH. of unifying ProtPln Activities 

Methods of directly modifying protein actxvxtxes 
include, inter alia, dominant negative mutations, specific 
drugs (used in the sense of this application) or chemical 
5 moieties generally, and also the use of antibodies, as 

previously discussed. 

Dominant negative mutations are mutations to endogenous 
genes or mutant exogenous genes that when expressed in a cell 
disrupt the activity of a targeted protein species. 
XO Depending on the structure and activity of the targeted 
protein, general rules exist that guide the selectxon of an 
appropriate strategy for constructing dominant negative 
mutations that disrupt activity of that target (Hershkowitz , 
1987, Nature 329:219-222). In the case of active monomerxc 
15 forms, over expression of an inactive form can cause 

competition for natural substrates or ligands sufficient to 
significantly reduce net activity of the target protein. 
Such over expression can be achieved by, for example, 
associating a promoter, preferably a controllable or 
20 inducible promoter, of increased activity with the mutant 
gene. Alternatively, changes to active site residues can be 
Le so that a virtually irreversible association occurs with 
the target ligand. Such can be achieved with certaxn 
tyrosine kinases by careful replacement of active sxte serxne 
25 residues (Perlmutter et al., 1996, Current Opinion xn 
Immunology 8:285-290). 

in the case of active multimeric forms, several 
strategies can guide selection of a dominant negative mutant. 
Multimeric activity can be controllably decreased by 
30 expression of genes coding exogenous protein fragments that 
bind to multimeric association domains and prevent multxmer 
formation. Alternatively, controllable over expression of an 
inactive protein unit of a particular type can tie up wxld- 
type active units in inactive multimers, and thereby decrease 
,5 multimeric activity (Hocka et al., 1990, The EMBO 3 9:1805- 
1813) . For example, in the case of dimeric DNA bxndxng 
proteins, the DNA binding domain can be deleted from the DNA 
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binding unit, or the activation domain deleted from the 
activation unit. Also, in this case, the DNA binding domain 
unit can be expressed without the domain causing association 
with the activation unit. Thereby, DNA binding sites are 
5 tied up without any possible activation of expression. In 
the case where a particular type of unit normally undergoes a 
conformational change during activity, expression of a rigid 
unit can inactivate resultant complexes. For a further 
example, proteins involved in cellular mechanisms, such as 
10 cellular motility, the mitotic process, cellular 

architecture, and so forth, are typically composed of 
associations of many subunits of a few types. These 
structures are often highly sensitive to disruption by 
inclusion of a few monomeric units with structural defects. 
15 Such mutant monomers disrupt the relevant protein activities 
and can be controllably expressed in a cell. 

in addition to dominant negative mutations, mutant 
target proteins that are sensitive to temperature (or other 
exogenous factors) can be found by mutagenesis and screening 
20 procedures that are well-known in the art. 

Also, one of skill in the art will appreciate that 
expression of antibodies binding and inhibiting a target 
protein can be employed as another dominant negative 
strategy. 

25 

rvr-nqc; 0 f specific know n action 

Finally, activities of certain target proteins can be 
controllably altered by exposure to exogenous drugs or 
ligands. In a preferable case, a drug is known that 

30 interacts with only one target protein in the cell and alters 
the activity of only that one target protein. Graded 
exposure of a cell to varying amounts of that drug thereby 
causes graded perturbations of pathways originating at that 
protein. The alteration can be either a decrease or an 

35 increase of activity. Less preferably, a drug is known and 
used that alters the activity of only a few (e.g., 2-5) 
target proteins with separate, distinguishable, and non- 
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n 4n« effects Graded exposure to such a drug causes 

^"urba^us to the several pathways originating at 
the target proteins. 

5 5 fytPAfiimEME* ™ METHODS 

5 Drug response and pathway responses are obtained for use 

in the instant invention by measuring the cellular 
constituents changed by «g exposure or by pathway 
Perturbation. These cellular characteristics can be of any 
„ a^ct of the biological state of a cell. They can be o£ the 
transcriptional state, in which *»* abundances are measured, 
the translation state, in which protein abundances are 
^sured. the activity state, in which protein actxviti es are 
measured. The cellular characteristics can also be or m!Xed 
16 Tpects, for example, in which the activities of one or more 
" proteins originating a particular biological 

measured along with the » abundances (gene expresses, 
cellular constituents in the pathway downstream of the 
originating protein(s) . This section describes exemplary 
meatus for measuring the cellular constituent, in drug or 
paS.ay responses. This invention is adaptable to other 
methods of such measurement. 

sediments of this invention based on measurxng the 
transcriptional state of drug and pathway responses are 
„ Referred. The transcriptional state can be measured by 
Lhnigues of hybridisation to arrays of nuclerc acid or 
Llei^d mimic prob~. described in the »-t subsection, 
or by other gene expression technologies, descried in the 
subsequent subsection. However measured, the result is 
,0 rfspTe data including values representing » abundanc ■ 
ratios, which usually reflect m expression ratios (» the 
absence of differences in H.* degradation rates), such 
measurement methods are described in Section S- 5 -*" 

in various alternative embodiments of the present 
S5 invention, aspects of the biological 

transcriptional state, such as the transl.txon.1 state the 
activity state, or mixed aspects can be measured. Detaxls of 
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these embodiments are described in this section. Such 
measurement methods are described in Section 5.5.2. 

5.5.1 TO ^«nnTPTIOM' T - MKASPREMgMT 
5 Preferably, measurement of the transcriptional state is 

made by hybridization to transcript arrays, which are 
described in this subsection. Certain other methods of 
transcriptional state measurement are described later xn thxs 
subsection. 

10 

Tr-*n«r.rip« - »|Ti-avs Generally 

in- a preferred embodiment the present inventxon makes 
use of "transcript arrays" (also called herein 
.•microarrays"). Transcript arrays can be employed for 
« analyzing the transcriptional state in a cell, and especially 
for measuring the transcriptional states of a cells exposed 
to graded levels of a drug of interest or to graded 
perturbations to a biological pathway of interest. 

in one embodiment, transcript arrays are produced by 
20 hybridizing detectably labeled polynucleotides representing 
the mHNA transcripts present in a cell (e.g., f luorescently 
labeled cDNA synthesized from total cell mRNA) to a 
microarray. A microarray is a surface with an ordered array 
of binding (e.g., hybridization) sites for products of many 
25 of the genes in the genome of a cell or organism, preferably 
ffi ost or almost all of the genes. Microarrays can be made xn 
a number of ways, of which several are described below 
However produced, microarrays share certain characterxstxcs : 
The arrays are reproducible, allowing multiple copxes of a 
30 given array to be produced and easily compared with each 
other. Preferably the microarrays are small, usually smaller 
than 5 cm*, and they are made from materials that are stable 
under binding (e.g. nucleic acid hybridization) condxtxons. 
A given binding site or unique set of binding sxtes xn the 
35 microarray will specifically bind the product of a sxngle 
gene in the cell. Although there may be more than one 
physical binding site (hereinafter .'site.') per specxfxc mRNA, 
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for the sake of clarity the discussion below will assume that 

there is a single site. 

It will be appreciated that when cDNA complementary to 
the RNA of a cell is made and hybridized to a microarray 
5 under suitable hybridization conditions, the level of 

hyb ridization to the site in the. array correspond^ , to any 
particular gene will reflect the prevalence xn the cell of 
xoPNA transcribed from that gene. For example, when 
detectably labeled (e.g., with a fluorophore, cDNA 
io complementary to the total cellular mPNA is hybridized to a 
microarray, the site on the array corresponding to a gene 
(i e., capable of specifically binding the product of the 
g ene that is not transcribed in the cell will have little or 
L signal (e.g., fluorescent signal), and a gene for whxch 
15 the encoded mRNA is prevalent will have a relatxvely strong 

Si9na in preferred embodiments, cDNAs from two different cells 
are hybridized to the binding sites of the microarray. In 
the case of drug responses one cell is exposed to a drug and 
20 another cell of the same type is not exposed to the drug. In 
the case of pathway responses one cell is exposed to a 
pathway perturbation and another cell of the same type xs not 
exposed to the pathway perturbation. The cDNA 
each of the two cell types are differently labeled so that 
25 they can be distinguished. In one embodiment, for example, 
COHA from a cell treated with a drug (or exposed a Pathway 
perturbation) is synthesized using a f luorescexn-labeled 
dNTP, and cDNA from a second cell, not drug-exposed, is 
synthesized using a rhodamine-labeled dNTP. When the two 
30 CDNAs are mixed and hybridized to the microarray ^ 

relative intensity of signal from each cDNA set xs determined 
for each site on the array, and any relative difference xn 
abundance of a particular mRNA detected. 

in the example described above, the cDNA from the drug 
35 treated (or pathway perturbed) cell will fluoresce green when 
the fluorophore is stimulated and the cDNA from the untreated 
cell will fluoresce red. As a result, when the drug 
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treatment has no effect, either directly or indirectly, on 
the relative abundance of a particular mRNA in a cell, the 
roRNA will be equally prevalent in both cells and, upon 
reverse transcription, red-labeled and green-labeled cDNA 
5 will be ; equally prevalent. When hybridized to the 

micrCarray, the binding site(s) for that species of RNA will 
emit wavelengths characteristic of both f luorophores (and 
appear brown in combination) . In contrast, when the drug- 
exposed cell is treated with a drug that, directly or 
10 indirectly, increases the prevalence of the mRNA in the cell, 
the ratio of green to red fluorescence will increase. When 
the drug decreases the mRNA prevalence, the ratio will 
decrease . 

The use of a two-color fluorescence labeling and 
15 detection scheme to define alterations in gene expression has 
been described, e.g., in Shena et al., 1995, Quantitative 
monitoring of gene expression patterns with a complementary 
DNA microarray, Science 270:467-470, which is incorporated by 
reference in its entirety for all purposes. An advantage of 
20 using cDNA labeled with two different f luorophores is that a 
direct and internally controlled comparison of the mRNA 
levels corresponding to each arrayed gene in two cell states 
can be made, and variations due to minor differences in 
experimental conditions (e.g., hybridization conditions) will 
25 not affect subsequent analyses. However, it will be 

recognized that it is also possible to use cDNA from a single 
cell, and compare, for example, the absolute amount of a 
particular mRNA in, e.g., a drug-treated or pathway-perturbed 
cell and an untreated cell. 

30 

Prenarati "n of Mir -roarravs 

Microarrays are known in the art and consist of a 
surface to which probes that correspond in sequence to gene 
products (e.g., cDNAs , mRNAs, cRNAs, polypeptides, and 
35 fragments thereof) , can be specifically hybridized or bound 
at a known position. In one embodiment, the microarray is an 
array (i.e., a matrix) in which each position represents a 
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discrete binding site for a product encoded by a gene (e g , 
a protein or RNA) , and in which binding sites are present for 
p/oducts of most or almost all of t*e genes in the organs' s 
genome. In a preferred embodiment, the "binding site 

5 (hereinafter, -site", is a nucleic acid or nucleic acid 

analogue to which a particular cognate cDNA can specifically 
hybrifize. The nucleic acid or analogue of the binding site 
can be, e.g., a synthetic oligomer, a full-length cDNA, a 
less-than full length cDNA, or a gene fragment. 

l0 Although in a preferred ernbodiment the microarray 

contains binding sites for products of all or 
ge „es in the target organism's genome, such comprehensiveness 
is not necessarily required. Usually the microarray will 
Lve binding sites corresponding to at least about 50% of the 
15 genes in the genome, often at least about 75%, more . often at 
least about 85%, even more often more than about 90%, and 
mos t often at least about 99%. Preferably, 

has binding sites for genes relevant to the action of a drug 
of interest or in a biological pathway of interest. A "gene 
20 is identified as an open reading frame (ORF) of P^ erab f £ 
Xeast 50, 75, or 99 amino acids from which ^ -ssenger H»A is 
transcribed in the organism (e.g., if a single cell) or in 
Tome cell in a multicellular organism. The number of genes 
in a genome can be estimated from the number of mRNAs 
25 expressed by the organism, or by extrapolation from a well- 
chlracterized portion of the genome. When the genome o 
organism of interest has been sequenced, the number of ORFs 
can be determined and mRNA coding regions identified by 
analysis of the DNA sequence. For example, the Saccharomyces 
30 cerevisiae genome has been completely sequenced and is 
reported to have approximately 6275 open reading 
(ORFs) longer than 99 amino acids. Analysis of these ORFs 
indicates that there are 5885 ORFs that are likely 
Protein products (Coffeau et al., Life with 6000 genes, 

35 science 274:546-567, which is incorporated by reference m 
its entirety for all purposes). In contrast, the human 
genome is estimated to contain approximately 10 genes. 



- 82 - 



WO 99/5870$ 



PCT/US99/10050 



. . ^K^iHisres is usually a nuclexc 

n „ nate cdna specifically hybridizes is 

acid or nucleic acid analogue attache, at that binding site, 
s » one embodiment, the binding sites of the «<=™ "° 
DH» polynucleotides corresponding to at least a portion of 
e.oh\ene in an organism, genome These «- « - 
obtained by, e.g.. polymerase oha„ re.ct.on (WW 
amplification of gene segments from genomic DHX, <e.,., 
i. w BT-PCR) . or cloned sequences. PCR primers are chosen, 
10 by kt n«i , „»«•■ or cDHA, that result 

T Sf^f^f egmentT ate. fr.g»»ts that do 
rST^ 10 Tases Of contiguous identic. seguence 
with any other fragment on the microarray) . Computer 
15 programs are useful in the design of primers wxth the 

reguC specificity and optima! amplification P°~«>~- 

e.g., oligo version 5.0 (national Biosciences). In t*e 
ca ofUnding sites corresponding to very J^*. 
*U sometimes be desirable to amplify segments near the 3 
„ end of the gene so that when oligo-dT primed cDNA probes are 
hytri^ to the microarray, less-than-full length probes 
.Ul bind efficiently. Typically each gene fragment on the 
microarray will be between about 50 bp and about 2000 bp, 
rorHyPlcally between about 100 bp and about 1000 bp^ and 
2 S SL^Ul about 300 bp and about 300 bp in length PCR 
meLods are well known and are described, for example, in 
TnTls et eds., 1990, a^r-l ,- » W** 

the microarray is by synthesis of synthetic polynucleotides 
or oligonucleotides, e.g., using N-phosphonate or 
35 Ltphoramidite chemistries .Prefer et al 

Acid JJes 14:5399-5407; McBride et al., 1983, Terr 

««. 2 4= 2 45- 2 4B,. synthetic seguences are between about 15 
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and about 500 bases in length, more typically between about 
20 and about 50 bases. In some embodiments, synthetxc 
nucleic acids include non-natural bases, e.g., xnosxne As 
noted above, nucleic acid analogues may be used as bxndxng 
. sites for hybridization. An example of a 

acid analogue is peptide nucleic acid (see, e.g., Egholm et 
7l 1993 , PNA hybridizes to complementary oligonucleotxdes 
obeying the Watson-crick hydrogen-bonding rules, Nature 
365:566-568; see also U.S. Patent No. 5,539,083). 
,o in an alternative embodiment, the bxndxng 

(hybridization) sites are made from plasmid or phage clones 
of genes, cDNAs (e.g., expressed sequence tags), or xnserts 

*. =t Differential gene expressxon 

therefrom (Nguyen et al. f 1995, uxirere * 

i„ the murine thymus assayed by quantitative hybridization of 
15 arrayed cDNA donas, Genomes 29=207-209) . In yet another 
embodiment, the polynucleotide of the binding sites xs KKA. 

,rln- to tK- 

The nucleic acid or analogue are attached to a solid 
20 support, which nay be made from glass, plastic <e g., 

polypropylene, nylon). polyacrylamide, nitrocellulose, or 
other materials. A preferred method for attaching the 
nucleic acids to a surface is by printing on glass plates as 
L described generally by schena et al.. X995. »■»**■*»• 
2, monitoring of gene expression patterns with a complementary 
DBA mlcroarr.y, Science 270:407-470. This 
especially useful for preparing .icroarrays of : cDKA. See 
also DeKisi et 1.. 1990. »se of a =DKA microarray to analyse 
gene expression patterns in human cancer. Kature Genetics 
„ 14^60, shalon et al., 1990, A DHA microarray system for 
anting complex OHA samples using two-color 
probe hybridization.. Genome **«. 6=039-045; and ■*->* * 
al 1995, Parallel human genome analysis; microarray-based 
expression of 1000 genes. Proc. Natl. Acad Sci. "SA 
as 93 = 10539-11200. Each of the aforementioned artxcles is 
incorporated by reference in its entirety for all purposes. 
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A second preferred method for making microarrays l. by 
making high-density oligonucleotide arrays. Technic are 
known for producing arrays containing thousands of 
oligonucleotides complementary to defined sequences at 
5 defined locations on a surface using P*°^°~ 991 
techniques for synthesis in situ (see, Fodor at al ^1991. 
Light-directed spatially addressable parallel 
synthesis, science 251:767-773; Pease at al., 1994, Light 
dieted oligonucleotide arrays for rapid DHA serenes 
x. analysts. PrL. Natl. Acad. Sci. OSA 91:5022-5026; M oKhart 
" "al.. 1996. session monitoring by hybridizat on to high- 
density oligonucleotide arrays, nature Biotech 14: 1675, U.S. 
Patent »os. 5,578.832; 5,556.752; and 5.510,270 each of 
which is incorporated by reference in its entirety for all 
16 purposes) or other methods for rapid synthesis and deposition 
oHefined oligonucleotides (Blanchard et al., 1996, Hlgh- 
fenstty oligonucleotide arrays, Biosensors . Bioelectronics 
" 687-90, When these methods are used, oligopeptides 
'.g 20-mere, of Known sequence are synthesized d rectly on 
2 0 a sur ace such as a derivative* glass slide dually, the 
array produced is redundant, with several oligonucleotide 
molecules per RKA. Oligonucleotide probes can be chosen to 
detect alternatively spliced mRHAs. Another preferred method 
o HaXing microarrays is by use of an InkJet « >^ 
25 to synthesize oligonucleotides directly on a solid phase, as 
descried, e.g.. in copending «.S. patent Ration serial 
Ho. 09/008,120 filed on January 16, 1998 by Bl«nol»rd 
entitled "Chemical syndesis Using H ~™^" ' 

which is incorporated by reference herein m its er*irety^ 
,0 other methods for ™*in, microarrays, e.g., by .asking 

(Maskos and Southern, 1992, Kuc. Acids Pes. 20, 1679-1684, , 
may also be used. In principal, any type of array, for 
e»ample, dot blots on a nylon hybridization membrane (see 
sambrook et al.. Molecular Cloning - A ^"'"TZl^ 
» Ed > vol. 1-3. cold spring Harbor Laboratory, cold spring 
ZL ! Hew «rk, 1989. which is incorporated in its entirety 
for all purposes) , could be used, although, as will be 
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recognized by those of skill in the art, very small arrays 
will be preferred because hybridization volumes will be 
smaller. 

5 rcp pftratin g labeled Probes 

Methods for preparing total and poly (A) + RNA are well 
known and are described generally in Sambrook et al., supra. 
I„ one embodiment, RNA is extracted from cells of the various 
types of interest in this invention using guanidinium 
10 thiocyanate lysis followed by CsCl centrifugation (Chirgwin 
et al., 1979, Biochemistry 18:5294-5299). Poly(A) + RNA is 
selected by selection with oligo-dT cellulose (see Sambrook 
et al., supra). Cells of interest include wild-type cells, 
drug-exposed wild-type cells, modified cells, and drug- 
15 exposed modified cells. 

Labeled cDNA is prepared from mRNA by oligo dT-primed or 
random-primed reverse transcription, both of which are well 
known in the art (see e.g., Klug and Berger, 1987, Methods 
Enzymol. 152:316-325). Reverse transcription may be carrxed 
20 out in the presence of a dNTP conjugated to a detectable 
label, most preferably a f luorescently labeled dNTP. 
Alternatively, isolated mRNA can be converted to labeled 
antisense RNA synthesized by in vitro transcription of 
double-stranded cDNA in the presence of labeled dNTPs 
25 (Lockhart et al., 1996, Expression monitoring by 

hybridization to high-density oligonucleotide arrays, Nature 
Biotech. 14:1675, which is incorporated by reference in its 
entirety for all purposes) . In alternative embodiments, the 
cDNA or RNA probe can be synthesized in the absence of 
30 detectable label and may be labeled subsequently, e.g., by 
incorporating biotinylated dNTPs or rNTP, or some similar 
means (e.g., photo-cross-linking a psoralen derivative of 
biotin to RNAs) , followed by addition of labeled streptavidin 
(e.g., phycoerythrin-conjugated streptavidin) or the 

35 equivalent. 

When f luorescently-labeled probes are used, many 
suitable fluorophores are known, including fluorescein, 
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lissamine, phycoerythrin, rhodamine (PerKin Elmer Cetus) , 
CV2 Cy3, Cy3.5, Cy5, Cy5.5, cy7, FluorX (Amersham) and 

Nonisotopic DMA Probe 
others (see, e.g., Kricka, im. won 

Techniques, Academic Press San Diego, CA) . It will be 
5 a^reclate; that pairs of fluorophores are chosen that have 
distinct emission spectra so that they can be easily 

^TlnoSer embodiment, a label other than a fluorescent 
label is used. For example, a radioactive label, or a pair 
10 of radioactive labels with distinct emission spectra, can be 
US ed (see Zhao et al., 1-5. Hi* density 
analysis: a novel approach for large-scale, quantitative 
analysis of gene expression, Gene 156:207; Pietu et al., 
1996 , Novel gene transcripts preferentially expressed in 
« human muscles revealed by quantitative hybridization of a 
high density cDNA array, Genome Res. 6:492). However 
because of scattering of radioactive particles, and^the 
consequent requirement for widely spaced binding sites, use 
of radioisotopes is a less-preferred embodiment. 
20 in one embodiment, labeled cDNA is synthesized by 

incubating a mixture containing O.S m« dCTP, 

pluS 0.1 mM dTTP plus fluorescent deoxyribonucleotides (e^g. , 
0.1 mK Rhodamine 110 UTP (Perken El»er Cetus) or 0.1 mM Cy3 
dUTP (Amersham)) with reverse transcriptase (e.g., 
25 Superscript- II, LTI Inc.) at 42- C for 60 min. 

m,h>H<Hzat «™ Minroarrays 

Nucleic acid hybridization and wash conditions are 
chosen so that the probe "specifically binds- or 
30 -specifically hybridizes" to a specif ic array 

the probe hybridizes, duplexes or binds to a sequence array 
site with a complementary nucleic acid sequence but does not 
hybridize to a site with a non-complementary nucleic acid 
sequence. As used herein, one polynucleotide sequence is 
35 considered complementary to another when, if the s 

the polynucleotides is less than or equal to 25 bases, there 
are no mismatches using standard base-pairing rules or, 
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the shorter of the polynucleotides is longer than 25 bases, 
there is no more than a 5% mismatch. Preferably, the 
polynucleotides are perfectly complementary (no mismatches) . 
It can easily be demonstrated that specific hybridization 
5 conditions result in specific hybridization by carrying out a 
hybridization assay including negative controls (see, e.g., 
Shalon et al., supra, and Chee et al., supra). 

Optimal hybridization conditions will depend on the 
length (e.g., oligomer versus polynucleotide greater than 200 
10 bases) and type (e.g., R*A, ™A, »») «f labeled probe and 
immobilized polynucleotide or oligonucleotide. General 
parameters for specific (i.e., stringent) hybridization 
conditions for nucleic acids are described in Sambrook et 
al., supra, and in Ausubel et al., 1987, Current Protocols 
15 in Molecular Biology, Greene Publishing and 

Wiley-lnterscience, New York, which is incorporated xn rts 
entirety for all purposes. When the cDMA microarrays of 
Schena et al. are used, typical hybridization conditions are 
hybridization in 5 X SSC plus 0.2% SDS at 65« C for 4 hours 
20 followed by washes at 25- C in low stringency wash buffer (1 
X SSC plus 0.2% SDS) followed by 10 minutes at 25' C in high 
stringency wash buffer (0.1 X SSC plus 0.2% SDS) (Shena et 
al., 1996, Proc. Natl. Acad. Sci. USA, 93:10614). Useful 
hybridization conditions are also provided in, e.g., 
25 Tijessen, 1993, u y^^al-.ion With Nucleic Acid Probes , 
Elsevier Science Publishers B.V. and Kricka, 1992, 
Nonisotopic DNA Probe Techniques, Academic Press San Diego, 



CA. 



30 si gnal D e <-»nf-<on afnri nafca Analysis 

When fluorescently labeled probes are used, the 
fluorescence emissions at each site of a transcript array can 
be, preferably, detected by scanning confocal laser 
microscopy. In one embodiment, a separate scan, using the 

35 appropriate excitation line, is carried out for each of the 
two fluorophores used. Alternatively, a laser can be used 
that allows simultaneous specimen illumination at wavelengths 
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specific to the two fluorophores and emissions from the two 
fluorophores can be analyzed simultaneously (see Shalon et 
al., 1996, A DNA microarray system for analyzing complex DMA 
samples using two-color fluorescent probe hybridization, 
5 Genome Research 6:639-645, which is incorporated by reference 
in its entirety for all purposes) . In a preferred 
embodiment, the arrays are scanned with a laser fluorescent 
scanner with a computer controlled X-Y stage and a microscope 
objective. Sequential excitation of the two fluorophores is 
10 achieved with a multi-line, mixed gas laser and the emitted 
light is split by wavelength and detected with two 
photomultiplier tubes. Fluorescence laser scanning devices 
are described in Schena et al., 1996, Genome Res. 6:639-645 
and in other references cited herein. Alternatively, the 
15 fiber-optic bundle described by Ferguson et al., 1996, Nature 
Biotech. 14:1681-1684, may be used to monitor mRNA abundance 
levels at a large number of sites simultaneously. 

Signals are recorded and, in a preferred embodiment, 
analyzed by computer, e.g., using a 12 bit analog to digital 
20 board, in one embodiment the scanned image is despeckled 
using a graphics program (e.g., Hijaak Graphics Suite) and 
then analyzed using an image gridding program that creates a 
spreadsheet of the average hybridization at each wavelength 
at each site. If necessary, an experimentally determined 
25 correction for "cross talk" (or overlap) between the channels 
for the two fluors may be made. For any particular 
hybridization site on the transcript array, a ratio of the 
emission of the two fluorophores can be calculated. The 
ratio is independent of the absolute expression level of the 
30 cognate gene, but is useful for genes whose expression is 
significantly modulated by drug administration, gene 
deletion, or any other tested event. 

According to the method of the invention, the relative 
abundance of an mRNA in two cells or cell lines is scored as 
35 a perturbation and its magnitude determined (i.e., the 

abundance is different in the two sources of mRNA tested) , or 
as not perturbed (i.e., the relative abundance is the same). 
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A s used herein, a difference between the two sources of RNA 
of at least a factor of about 25% (RNA from one source is 25% 
*ore abundant in one source than the other source) , more 
usually about 50%, even more often by a factor of about 2 
5 (twice as abundant) , 3 (three times as abundant) or 5 (five 
times as abundant) is scored as a perturbation. Present 
detection methods allow reliable detection of difference of 
an order of about 3-fold to about 5-fold, but more sensitive 
methods are expected to be developed. 
10 Preferably, in addition to identifying a perturbation as 

positive or negative, it is advantageous to determine the 
magnitude of the perturbation. This can be carried out, as 
noted above, by calculating the ratio of the emission of the 
two fluorophores used for differential labeling, or by 
15 analogous methods that will be readily apparent to those of 
skill in the art. 

Moae nr<»inenr Pathway Fesponses 

in one embodiment of the invention, transcript arrays 
20 reflecting the transcriptional state of a cell of interest 
are made by hybridizing a mixture of two differently labeled 
probes each corresponding (i.e., complementary) to the mRNA 
of a different cell of interest, to the microarray. 
According to the present invention, the two cells are of the 
25 same type, i.e., of the same species and strain, but may 
differ genetically at a small number (e.g., one, two, three, 
or five, preferably one) of loci. Alternatively, they are 
isogeneic and differ in their environmental history (e.g., 
exposed to a drug versus not exposed) . 
30 in order to measure pathway responses, cells are 

prepared or grown in the presence of graded perturbations to 
a pathway of interest. The cells exposed to the perturbation 
and cells not exposed to the perturbation are used to 
construct transcript arrays, which are measured to find the 
35 mRNAs with modified expression and the degree of modification 
due to exposure to the drug. Thereby, the pathway response 
is obtained . 
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The density of levels of the graded drug exposure and 
graded perturbation control parameter is governed by the 
sharpness and structure in the individual gene responses - 
the steeper the steepest part of the response, the denser the 
5 levels needed to properly resolve the response. This 

exemplary density is approximately indicated by the example 
of Fig. 3. There, six exposures to methotrexate over a 
hundred-fold range of concentrations was just sufficient to 
resolve the gene expression responses. However, more 
10 exposures are preferably to more finely represent this 
pathway. 

Further, it is preferable in order to reduce 
experimental error to reverse the fluorescent labels in two- 
color differential hybridization experiments to reduce biases 
15 peculiar to individual genes or array spot locations. In 
other words, it is preferable to first measure gene 
expression with one labeling (e.g., labeling perturbed cells 
with a first fluorochrome and unperturbed cells with a second 
fluorochrome) of the mRNA from the two cells being measured, 
20 and then to measure gene expression from the two cells with 
reversed labeling (e.g., labeling perturbed cells with the 
second fluorochrome and unperturbed cells with the first 
fluorochrome) . Multiple measurements over exposure levels 
and perturbation control parameter levels provide additional 
25 experimental error control. With adequate sampling a trade- 
off may be made when choosing the width of the spline 
function S used to interpolate response data between 
averaging of errors and loss of structure in the response 
functions. Approximately ten measurements over drug exposure 
30 and perturbation control parameter intervals, repeated with 
reversal of the fluorescent labels, which together require 
approximately 20 hybridization experiments per drug response 
or perturbation response, achieve reliable identification of 
pathways and their member genes and proteins. 

35 

Measurement of D mo Response Data 
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To measure drug response data, the cells are exposed to 
graded levels of the drug or drug candidate of interest. 
When the cells are grown in vitro, the compound is usually 
added to their nutrient medium. In the case of yeast, it is 

5 preferable to harvest the yeast in early log phase, since 
expression patterns are relatively insensitive to time of 
harvest at that time. The drug is added is a graded amount 
that depends on the particular characteristics of the drug, 
but usually will be between about 1 ng/ml and 100 mg/ml. In 

10 some cases a drug will be solubilized in a solvent such as 
DMSO. 

The cells exposed to the drug and cells not exposed to 
the drug are used to construct transcript arrays, which are 
measured to find the mRNAs with altered expression due to 
15 exposure to the drug. Thereby, the drug response is 
obtained. 

Similarly for measurements of pathway responses, it is 
preferable also for drug responses, in the case of two-color 
differential hybridization, to measure also with reversed 
20 labeling. Also, it is preferable that the levels of drug 
exposure used proved sufficient resolution (e.g., by using 
approximately 10 levels of drug exposure) of rapidly changing 
regions of the drug response. 

25 ™-Ho»- Method s ~* T^ngcriotional State Measurement 

The transcriptional state of a cell may be measured by 
other gene expression technologies known in the art. Several 
Buch technologies produce pools of restriction fragments of 
limited complexity for electrophoretic analysis, such as 

30 methods combining double restriction enzyme digestion with 
phasing primers (see, e.g., European Patent O 534858 Al, 
filed September 24, 1992, by Zabeau et al.), or methods 
selecting restriction fragments with sites closest to a 
defined mRNA end (see, e.g., Prashar et al., 1996, Proc. 

35 Natl. Acad. Sci. USA 93:659-663). Other methods 

statistically sample cDNA pools, such as by sequencing 
sufficient bases (e.g., 20-50 bases) in each of multiple 
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cDNAs to identify each cDNA, or by sequencing short tags 
(e g 9-10 bases) which are generated at known positions 
relative to a defined mRNA end (see, e.g., Velculescu, 1995, 
Science 270:484-487). 

5 s.5.2 w.'A BTTREMEMT Q » ™™ ASPEfiTfl OF PTOfrQOICM, STATE , 

in various embodiments of the present invention, aspects 
of the biological state other than the transcriptional state, 
such as the translational state, the activity state, or mixed 
10 aspects can be measured in order to obtain drug and pathway 
responses. Details of these embodiments are described in 
this section. 

„wwH«mfc , «n Transitional State Measurements 

15 Measurement of the translational state may be performed 

according to several methods. For example, whole genome 
monitoring of protein (i.e., the "proteome," Goffeau et al., 
supra) can be carried out by constructing a microarray m 
which binding sites comprise immobilized, preferably 
20 monoclonal, antibodies specific to a plurality of protein 
species encoded by the cell genome. Preferably, antibodies 
are present for a substantial fraction of the encoded 
proteins, or at least for those proteins relevant to the 
action of a drug of interest. Methods for making monoclonal 
25 antibodies are well known (see, e.g., Harlow and Lane, 1988. 

A x-w^n-rv Manual. Cold Spring Harbor, New 
York, which is incorporated in its entirety for all 
purposes) . In a preferred embodiment, monoclonal antibodies 
are raised against synthetic peptide fragments designed based 
30 on genomic sequence of the cell. With such an antibody 

array, proteins from the cell are contacted to the array, and 
their binding is assayed with assays known in the art. 

Alternatively, proteins can be separated by two- 
dimensional gel electrophoresis systems Two-dimensional gel 
35 electrophoresis is well-known in the art and typically 
involves iso-electric focusing along a first dimension 
followed by SDS-PAGE electrophoresis along a second 
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dimension. See, e.g., Hames et al, 1990, Gp\ Elec trophoresis 
~* ornf^ns: & Practical Approach, IRL Press, New York; 
Shevchenko et al., 1996, Proc. Nat'l Acad. Sci. USA 93:1440- 
1445; Sagliocco et al., 1996, Yeast 12:1519-1533; Lander, 
5 1996, science 274:536-539. The resulting electropherograms 
can be analyzed by numerous techniques, including mass 
spectrometry techniques, western blotting and immunoblot 
analysis using polyclonal and monoclonal antibodies, and 
internal and N-terminal micro-sequencing. Using these 
10 techniques, it is possible to identify a substantial fraction 
of all the proteins produced under given physiological 
conditions, including in cells (e.g., in yeast) exposed to a 
drug, or in cells modified by, e.g., deletion or over- 
expression of a specific gene. 



15 



^vw,^^ R^Rd on m-hPr Aspects of the Biological State 
Although monitoring cellular constituents other than 
mRNA abundances currently presents certain technical 
difficulties not encountered in monitoring mRNAs, it will be 
20 apparent to those of skill in the art that the use of methods 
of this invention, including application of various known 
methods of pathway perturbation, are applicable to any 
cellular constituent that can be monitored. 

In particular, where activities of proteins relevant to 
25 the characterization of drug action can be measured, 
embodiments of this invention can be based on such 
measurements. Activity measurements can be performed by any 
functional, biochemical, or physical means appropriate to the 
particular activity being characterized. Where the activity 
30 involves a chemical transformation, the cellular protein can 
be contacted with the natural substrate (s) , and the rate of 
transformation measured. Where the activity involves 
association in multimeric units, for example association of 
an activated DNA binding complex with DNA, the amount of 
3S associated protein or secondary consequences of the 

association, such as amounts of mRNA transcribed, can be 
measured. Also, where only a functional activity is known, 
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for example, as in cell cycle control, performance of the 
function can be observed. However known and measured, the 
changes in protein activities form the response data analyzed 
by the foregoing methods of this invention. 
5 in alternative and non-limiting embodiments, response 

data may be formed of mixed aspects of the biological state 
of a cell- Response data can be constructed from, e.g., 
changes in certain raRNA abundances, changes in certain 
protein abundances, and changes in certain protein 
10 activities. 

5.6 APPLICATIONS TO DP nc DTSCOVERY 

The present invention has numerous applications in the 
field of drug discovery, some of which are presented herein. 
15 in one application, the present invention provides a method 
for determining other biological pathway of action of a 
candidate drug for which a putative pathway of action has 
already been identified are determined. As noted supra, drug 
development often involves testing numerous compounds for a 
20 specific effect on a known biological pathway, such as a 
pathway originating at a cloned gene sequence or isolated 
enzyme or protein. In this process, drug candidates that 
apparently affect the putative pathway are identified, but 
little or no information is generated about the specificity 
25 of the drug (e.g., what other biological pathways are 

affected) , or about the particular effects of the drug on the 
affected pathways. The method of the present invention 
provides this information . 

FO* example, provided with a candidate drug that appears 
30 to affect a putative biological pathway, the methods of the 
present invention can be applied to confirm that the putative 
pathway is indeed a pathway of action of the drug, as well as 
for development of drugs (e.g., such as an ideal drug) that 
are more specific for the putative pathway (i.e., are more 
35 pathway-specific) in that they affect fewer biological 
pathways other than the desired putative pathway. This 
application can be achieved by direct employment of the 

- 95 - 



WO 99/58708 



PCT/US99/10050 



m ethods described generally in Section 5.2 and specifically 
in Section 5.3 (especially with reference to Fig. 5 . 
Accordingly, in one aspect, this is achieved 
measuring drug response data for the drug or candidate of 
5 interest; (ii) measuring the perturbation response for the 
putative biological pathway of drug action (e.g., if the 
"iological pathway originates at a gene, then the expression 
of the gene may be controlled in a graded manner) ; (iii) 
representing the drug response data as best as possible in 
XO terms of pathway response data for the putative pathway of 
drug action; and (iv) assessing the significance of the 
representation to determine whether significant effects of 
the drug have been fully represented and verifying that the 
putative pathway is actually a pathway of action of the drug. 
15 If, as is described in more detail supra with respect to 

step 507, exposing the cell to the drug along with 
TerLba^ons to the putative pathway results in i^ring 
effects on the response of the cellular constituents of the 
pathway, then this indicates that the pathway is indeed a 
20 pathway of action of the drug. In other words, the combined 
Tesponse is assessed to determine whether it is more liKe the 
illustrated in Fig. 7A than that illustrated in Fig. 7B. On 
the other hand, if the effects of combined exposure are 
primarily additive (like that illustrated in Fig. 7B) then 
25 "lis indicates that the putative pathway is not a pathway of 
drug action. Further, if, as is described in more detail 
supra with respect to step 506, the best ^^^^ 
the drug response data by the pathway response data is found 
to be highly significant, for example, by surpassing a 95% 
30 significance threshold, then this indicates that <*« 
candidate drug is highly specific for the putative 
pathway (with few or- no direct effects on other >^ical 
pathways, such as those originating at other genes, or gene 
products, or gene product activities). On the other hand, if 
35 this representation is found to be not sufficiently 
significant, then this indicates that other biological 
pathways are affected by the drug or candidate of interest. 
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In the latter case, in which other biological pathways 
in the cell are affected, the structure of the candidate drug 
may be modified (e.g., using organic synthesis methods well 
known in the arts of pharmaceutical or medicinal chemistry) 
5 or closely related compounds may be identified, or the like, 
and tested according to the present invention until a drug 
that is more pathway-specific (i.e., affecting fewer pathways 
other than the putative pathway) for the putative pathway (or 
even an ideal drug affecting only the putative pathway) is 

10 identified. 

in another application, the methods of this invention 
can be used to select, from a set of candidate compounds, the 
drug or drugs with the highest pathway specificity by 
identifying all the cellular biological pathways of compounds 
15 in the set. Usually, the drug with the highest pathway 
specificity will be the one that directly affects only its 
intended pathway. When the intended pathway is not known, 
the drug that affects the fewest number of pathways is likely 
to be more pathway-specific than a drug that affects a 
20 greater number of pathways, and is a preferred candidate. A 
drug with high specificity (i.e., highly pathway-specific) is 
of interest because such a drug will likely have fewer side 
effects when administered to a patient. 

in further applications, the invention can be used to 
25 identify the pathway (s) of action a drug that has a known 
biological effect on cells (or on patients) , but for which 
the mechanism or pathway of action is not known. By 
identifying the pathway of action of a drug with a desirable 
therapeutic activity it is possible to identify other 
30 compounds having a similar therapeutic activity, as well as 
to identify compounds with greater pathway specificity. In 
such an application,, the drug response data is fit with a 
combination of pathways likely to be affected by the drug, or 
with pathways simply drawn from a compendium of pathways, and 
35 the pathway combination best fitting the drug response 

determined. Conversely, the methods of this invention can be 
used to identify a compound or compounds that affect a 
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particular pre-determined biological pathway in a cell, or 
that affect a particular combination of pathways. In such an 
application, the significance of the best fit of the drug 
response data to the pathway response data (or combxnatxon of 
5 pathway response data) is determined to see if it meets a 
certain threshold of significance. 

in yet a further application, the method is used to 
identify "secondary drug loci." Secondary drug loci are 
cellular constituents of any type (such as genes or gene 
xo products or gene product activities) , that are indirectly 
affected by the administration of a drug. They are 
identified by the fact that they correspond to cellular 
constituents having positive or negative perturbatxons xn the 
pathway response data, but are not directly affected by the 
X5 drug. For example, secondary drug loci include cellular 
constituents in a biological pathway originating at a 
directly-affected target of the drug (excluding the 
originating directly-affected target cellular constxtuent) . 
The identification of secondary drug loci is useful xn drug 
20 design. As discussed above, the homeostatic mechanxsms of 
the cell usually assure that a change in one cellular 
constituent (e.g., gene, or gene product, or gene product 
activity) is compensated for by changes in the expressxon 
and/or activity of other cellular constituents. 
25 Recognition of these compensatory changes provxdes a new 

approach to drug intervention, as follows: Disease can o ten 
be considered the result of abnormal activation of biological 
pathway as a result of abnormal expression of a cellular 
constituent originating that pathway (e.g., a gene of a host 
30 or a pathogen), conventional approaches to drug intervention 
see* to modulate the abnormal pathway activity by actxng at 
this primary originating cellular constituent. However, the 
present method identifies secondary drug loci, which are 
cellular constituents, such as genes or gene products that 
35 drug indirectly affects (e.g., by being part of an affected 
biological pathway) when a pathway is directly affected. 
Using this information, it is possible to identify drugs that 
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affect the secondary cellular constituents, providing 
alternative approaches to treatment (and a much greater array 
of potential pathways for drug action) . 

For example, if in a diseased state cellular constituent 
5 X is under-expressed, the conventional goal of therapy is to 
restore the expression of X, and drugs may b6 identified that 
achieve this result by directly affecting the expression of 
X However, the present method allows identification of 
other cellular constituents having X as a secondary drug 
10 loci, that is these other cellular constituents originate 
pathways including X that are also affected by the action of 
the drug. Corrected expression of element X will thereby 
result from the action of a drug on such pathways. Thus, 
secondary pathways (e.g., those originating at proteins, or 
15 at protein activities) that produce desired therapeutic 
outcomes if inhibited or activated can be identified, and 
drugs can be identified that affect these other pathways to 
achieve the desired therapeutic outcome {e.g., restoring the 
expression of X) , other than by direct effects on X. 
20 in additional applications, the methods of this 

invention can be used to identify biological pathways that 
mediate the therapeutic actions or that mediate the side- 
effects of a drug of interest by comparison of the drug of 
interest with other drugs having similar therapeutic effects. 
25 two drugs are considered to have similar therapeutic effects 
if they both exhibit similar therapeutic efficacy for the 
same disease of disorder in a patient or in an animal disease 
model. Drugs known to have similar, or closely similar, 
therapeutic affects are often found to act on the same 
30 biological pathways. Therefore, the methods of this 

invention can be applied to determine the pathways affected 
by the drug of interest and also of a second drug with 
similar therapeutic effects. Pathways that are common to 
both drugs are those pathways likely to mediate the 
35 therapeutic effects of the drug of interest (and also of the 
second drug). By comparing common pathways determined for 
additional drugs with similar therapeutic effects, the 
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pathways mediating the therapeutic effects of the drug of 
interest can be further narrowed or identified. 

Similarly, pathways affected by a drug that mediate the 
side-effects can be determined by the methods of this 
5 invention. The pathways affected by the drug of interest and 
of a second drug with a similar therapeutic effects are ■ 
determined according to this invention. The pathways of the 
drug of interest that are not also pathways of the second 
drug are likely to be those mediating the side-effects of the 
10 drug of interest. By comparing common pathways determined 
for additional drugs with similar therapeutic effects, 
pathways mediating the side-effects of the drug of interest 
effects can be more certainly identified. Optionally, a more 
pathway-specific derivative of the drug of interest can be 
15 identified by next applying the previous described steps for 
improving the specificity of the drug of interest in order to 
eliminate the pathways mediating side-effects. 

When the cell employed in the methods of this invention 
is a non-human eukaryotic cell, e.g.. a yeast cell, it is 
20 often possible to extrapolate from the effects of the drug in 
the non-human cell to the effect in the human cell. This is 
due in part, to the fact that a large proportion of genes 
have homologous counterparts of similar function in most 
eukaryotes. As noted above, almost half of the proteins 
25 identified as defective in human heritable diseases show 
amino acid similarity to yeast proteins. It has also been 
reported that about 80% of all genes known to cause human 
disease have homologs in C. elegans (-Experts gather to 
discuss technologies being developed for functional genomic 
30 analysis," Genetic Engineering Newsxie, Nov. 15, 1996). 

in yet additional applications, the methods of the 
present invention can be used to ascertain the similarity of 
the effects of different drugs. This application corresponds 
to the particular case wherein the number of pathways scaled 
35 to fit the drug response data is equal to unity, i.e., to 
one Thus, in the particular embodiment, R denotes the 
response of the -.perturbation" drug, referred to herein as 
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Drug R, which is being compared to the response, D, generated 
by the first drug, referred to herein as Drug D. The 
correlation coefficient obtained from Eqns. 8 and 9, or, 
alternatively, the least-squares residual obtained from Eqn. 
5 6, provides a quantitative measure of similarity of the 
effects of the two drugs. 

This method of comparing drug responses is significantly 
superior to correlations based on single-concentration 
roe asurements, for the following reason. The two drugs in 
10 question, Drugs D and R, may have the same pathway of action 
but at different potencies. A measurement at one 
concentration therefore will sample the drug response curves, 
such as the those illustrated in Figure 2A, at only one 
titration level. In general, the titration level 
15 corresponding to a given concentration of Drug D will be 
different from the titration level corresponding to the same 
concentration of Drug R. For example, the measurement of 
Drug D may correspond to Titration level 1 in Figure 2A 
whereas the measurement of Drug R may correspond to Titration 
20 level 4. Changes in only genes Gl and 02 will therefore be 
observed for Drug D, while changes in all of the genes G1-G6 
will be observed for Drug R. Consequently, the similarity of 
the responses of Drugs D and R will not be as readily 
apparent as if the entire response curve from zero 
25 concentration to saturation is sampled for both Drug D and 
Drug R, and the best correlation found via the scaling 
transformation as described in Section 5.3.1, above. 

Such a superior measure of drug similarity may be the 
basis, e.g., for the classification of new compounds into 
30 classes defined by existing compounds, including recognizing 
probable therapeutic or toxic effects based on this 
classification. Such a superior measure of drug similarity 
»ay also- be the basis for the grouping of new or existing 
compounds so as to reduce redundancy of libraries or lists of 
35 compounds, or to support decisions about what screening or 
other action to take with a particular compound. 



- 101 - 



PCT/US99/10050 

WO 99/58708 

6 EXAMPLES 

The following example of pathway perturbations by such 
drugs of known specific actions is presented by way of 
illustration of the previously described invention and are 
not limiting of that description. In this example, pathways 
are defined by the graded exposure of a cell to cyclosporin A 
("CyA") and- to methotrexate ("Mtx") and the. pathway of action 
of an "unknown" drug, herein FK506, is determined. 

By way of background, CyA acts directly to inhibit 
calcineurin, and can be used to define the pathway 
originating at calcineurin. Mtx acts directly to inhibit the 
DHFR (dihydrofolate reductase) protein, and can also be used 
to define the pathway originating at this protein. FK506 is 
also a specific regulator of the calcineurin protein, on 
which it acts via a complex with an FK506 binding protein 
(Cardenas et al . , 1994, Yeast as model T cells. Perspectives 
in Drug Discovery and Design 2:103-126 ) . 

The gene expression measurements illustrated in Figs. 
8A-C were made as detailed below. To generate the 
cyclosporin dose response curves, an overnight starter 
culture of S.cerevisiae strain R563 (Genotype: Mat a ura3- 
52 lys2-801 ade2-101 trpl-A63 his3-A200 Ieu2-Al his3::HIS3) 
was diluted into 200 ml of YAPD plus 10 mM CaCl 2 medium (see, 
e.g., Ausubel et al., eds., 1996, Current Protocols in 
«. 1P n.l a r Biology, John Wiley & Sons, Inc., especially ch. 
13) to an OD 600 of 0.1 and grown at 30°C with 300 rpm shaking. 
After a 30 min, cyclosporin A dissolved in ethanol was added 
to cultures at final concentrations of 60, 30, 15, 6 and 3 
/xg/ml. Control cultures were treated with the same volume of 
just ethanol. Growth was monitored by OD 60fl and cells were 
harvested at OD 6oe =1.4' +/-0.1 by centrifugation for 2 min at 
ambient temperature in a Sorvall RC5C+ centrifuge in a SLA- 
1500 rotor. The supernatant was discarded, the residual 
liquid removed by pipetting, and the cells were resuspended 
in 4 ml RNA Extraction Buffer (0.2 M Tris HCl pH 7.6, 0.5 M 
NaCl, 10 mM EDTA, 1% SDS) . Cells were vortexed for 3 sec to 
resuspend the pellet and then immediately transferred to 50 
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ml conical centrifuge tubes containing 2.5 g baked glass 
beads (425-600 pa) a™* 4 ml phenol : chloroform (50:50 v/v). 
Tubes were vortexed for 2 min in the VWR Multi-tube Vortexer 
at setting 8 prior to centrifugation at 3000 rpra for 5 min at 
5 ambient temperature in a Sorvall Model T600D tabletop 
centrifuge to separate the phases. The aqueous phase was 
reextracted with equal volume of phenol: chloroform (50:50 
v/v) by vortexing for 30 sec at setting 6 followed by 
centrifugation as before. To the aqueous phase was added 2.5 
10 volumes of ethanol and the samples were stored at -80 'C until 
isolation of polyA* mRNA. 

To generate FK506 dose response curves, the above 
procedure was followed except that FK506 dissolved in ethanol 
was added to cultures at final concentrations of 10, 3.1, 
15 1.0, 0.31, 0.10 fig/ml. 

To generate methotrexate dose response curves, an 
overnight starter culture of S.cerevisiae strain BY4741 
(Genotype: Mat a his3A0 leu2D0 ura3A0 metlSAO) was diluted 
into 200 ml of SC medium (see, e.g., Ausubel et al, ch. 13) 
20 to an OD^ of 0.1 and grown at 30»C with 300 rpm shaking. 
After 30 min, methotrexate dissolved in water was added to 
cultures at final concentrations of 200, 100, 50, 25, 6.2, 
and 3.1 MM. Control cultures were treated with the same 
'■ volume of water. The rest of the procedure was as above. 
25 In all cases, polyA + RNA was isolated by oligo-dT 

cellulose chromatography using two selections by standard 
protocols (see, e.g., Sambrook et al. 1989, Molecular Cloning 
a. Laboratory Manual , Cold Spring Harbor Laboratory Press) . 
Two micrograms of polyA* RNA was used in reverse 
30 transcription reactions as previously described in Section 
5.5.1. cDNA was purified and hybridized to poly lysine slides 
as also previously described in Section 5.5.1. Extent of 
hybridization was determined by scanning with a prototype 
multi-frame CCD camera slides produced by Applied Precision, 
35 me. Images were processed by informatics and imported into 
the Inpharma database and analyzed using the MatLab data 
analysis package. 
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Fig. 8C illustrates the drug response data generated by 
a series of FK506 exposures. This figure has values of the 
drug exposure on the horizontal axis, and values of the 
logarithm of the expression ratio of the genes most affected 
5 by FK506 on the vertical axis. Fig. 8A illustrates the 
pathway response data for the pathway originating at the 
calcineurin protein and generated by a series of CyA 
exposures. This figure and Fig. 8B have values of the 
pathway perturbation control parameter, which is in this case 

10 the level of drug exposure, on the horizontal axis, and 

values of the logarithm of the expression ratio of the genes 
most affected by these drugs on the vertical axis. Fig. 8B 
illustrates the pathway response data for the pathway 
originating at the DHFR (dihydrof olate reductase) protein and 

15 generated by a series of Mtx exposures. 

Fk506, the "unknown" drug, is modeled with a linear sum 
of the measured pathway responses resulting from graded 
exposures to CyA or Mtx (separately exposed) , forming a 
composite response involving at least pathways originating at 

20 calcineurin and at DHFR. Fig. 8D shows a graph of the 

correlation coefficient, which is obtained by genome-wide 
correlation of the FK506 responses against the combined 
pathway responses of Cyc A and Methotrexate, against 
different values of the scaling parameter that was applied to 

25 the FK506 data. The correlation coefficient was obtained 
according to the methods outlined in Section 5.3.1, in 
particular according to Eqns. 8 and 9. The approximate 
relative potency of FK506 and Cyc A, which was approximately 
63, was recovered as the location of the correlation peak 

30 illustrated in Fig. 8D. 

Table I lists the set of genes common to FK506 and Cyc A 
responses. These genes were identified as those genes having 
a correlation coefficient between the drug and pathway 
response curves for that gene of at least 0 . 9 at the value of 

35 scaling parameter (63) , which gave the maximum genome-wide 

correlation. These correlation coefficients were computed as 
p k (63) according to Eqn. 9, where the index n k n corresponds to 
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a particular gene. These correlated genes were the same 
whether or not the Mtx pathway response is added to the 
minimization, illustrating the ability to identify a pathway 
within a composite response. 

5 
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TABLE I 



5 



15 



20 



25 



witn 
Methotrexate 


Without 
Methotrexate 




gyp7 


YJL171C 


YJL171C 


CMK2 


CMK2 


YR02 


YR02 


RIM101 


RIM101 


YKL218C 


YKL218C 


YLR414C 


YLR414C 


YDR425W 


YDR425W 


YNL195C 


YNL195C 


YOR385W 


YOR385W 


YOR220W 


YOR220W 


HAC1 


HAC1 


YLR121C 


YLR121C 


hxkl 


hxkl 


YHR097C 


YHR097C 


SUR1 


SDR1 


CWP1 


CWP1 


YBR005W 


YBR005W 


yap3 


yap3 


YMR316W 


YMR316W 


YBR004C 


YBR004C 



This example illustrated the usefulness of the methods 
of this invention in that a maximum value of the scaling 
parameter and consistent sets of identified genes were easily 
identified. 

In this experiment Cyc A and Methotrexate responses were 
added linearly for the purposes of the numerical analysis (as 
previously described) In case of actual simultaneous 
exposure to both drugs non- linear terms may need to be added 
to the model drug responses. 
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7 REFERENCES CITED 
All references cited herein are incorporated herein by 
reference in their entirety and for all purposes to the same 
extent as if each individual publication or patent or patent 
5 application was specifically and individually indicated to be 
incorporated by reference in its entirety for all purposes. 

Many modifications and variations of this invention can 
be made without departing from its spirit and scope, as will 
be apparent to those skilled in the art. The specific 
10 embodiments described herein are offered by way of example 
only, and the invention is to be limited only by the terms of 
the appended claims, along with the full scope of equivalents 
to which such claims are entitled. 
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WHAT IS PT AIMED IS : 

1. A method of identifying biological pathways involved in 
the action of a drug in a cell type comprising: 

(a) providing a drug response of said drug in said cell 
type, said drug response having been obtained by a method 
comprising measuring a plurality of cellular constituents in 
a cell of said cell type at a plurality of levels of exposure 
to said drug; 

(b) representing a model drug response as a combination 
of one or more biological pathway responses in said cell 
type, wherein a biological pathway response in said cell type 
is the product of a method comprising measuring cellular 
constituents of a biological pathway in a cell of said cell 
type at a plurality of levels of a perturbation to said 
biological pathway, and wherein each of said one or more 
biological pathway responses in said combination are. subject 
to an independent scaling transformation; and 

(c) determining best scaling transformations of said 
one or more biological pathway responses which minimize the 
value of an objective function of the difference between said 
provided drug response and said model drug response, 

wherein said combination of said one or more biological 
pathway responses subject to said best scaling 
transformations identifies the biological pathways involved 
in the action of said drug in said cell type. 

2. The method of claim 1 wherein said determining step (c) 
further comprises determining an actual minimized value of 
said objective function, said actual minimized value of said 
objective function being the minimized value of said 
objective function determined from said provided drug 
response and said model drug response, and wherein said 
method further comprises, after said step (c) , a step of 
assessing the statistical significance of said best scaling 
transformations of said one or more biological pathways by a 
method comprising: 
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(d) obtaining an expected probability distribution of 
said minimized values of said objective function; and 

(e) assessing the statistical significance of said 
actual minimized value of said objective function in view of 
said expected probability distribution of minimum values of 
said objective function. 

3. A method of identifying biological pathways involved in 
the action of a drug in a cell type, said method comprising 
determining best scaling transformations of one or more 
biological pathway responses which minimize the value of an 
objective function of the difference between a provided drug 
response and a model drug response, wherein: 

(a) said one or more biological pathway responses are 
the product of a method comprising measuring 
cellular constituents of one or more biological 
pathways in a cell of said cell type at a plurality 
of levels of perturbation to said biological 
pathways ; 

(b) said provided drug response is provided by a method 
comprising measuring a plurality of cellular 
constituents in a cell of the cell type at a 
plurality of levels of exposure to said drug; and 

(c) said model drug response is represented as a 
combination of said one or more biological pathway 

-. responses, each of said one or more biological 

pathway responses in said combination being subject 

: to an independent scaling transformation, 
wherein the combination of said one or more biological 
pathway responses subject to said best scaling 
transformations identifies the biological pathways involved 
in the action of said drug in said cell type. 

4. The method of claim 2 wherein said step of obtaining 
said expected probability distribution of minimum values of 
said objective function comprises the steps of: 
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(i) randomizing said drug response with respect to said 
plurality of levels of drug exposure and randomizing said 
model drug response by randomizing said one or more 
biological pathway responses with respect to said plurality 

5 of levels of perturbation to said one or more biological 
pathways; 

(ii) determining a theoretical minimum value of said 
objective function by finding best scaling transformations of 
said one or more randomized biological pathway responses 

10 which minimize said objective function of the difference 
between said randomized drug response and said randomized 
model drug response; and 

(iii) repeating steps (i) and (ii) to determine a 
plurality of theoretical minimum values, said plurality of 

15 minimum values forming said expected probability distribution 
of minimized values. 

5. The method of claim 1 or 3 further comprising a step of 
verifying that said one or more biological pathways are 
20 biological pathways actually involved in the action of said 
drug in said cell type by a method comprising selecting a 
model response from the group consisting of a first model 
response and a second model response, that behaves most 
similarly to a combined drug-perturbation response, wherein: 
25 (i) said combined drug-perturbation response is 

provided by a method comprising measuring a plurality of 
cellular constituents in a cell of said cell type exposed 
simultaneously to one or more levels of said exposure to said 
drug and to one or more levels of perturbations in said one 
30 or more biological pathways; 

(ii) said first model response comprises said 
combination of said one or more biological pathway responses 
subject to said best scaling transformations evaluated at one 
or more first sums, each said first sum being the sura of one 
3 5 of said one or more levels of drug exposure subject to said 

scaling transformations and one of said one or more levels of 
perturbations to said biological pathways, 
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(iii) said second model response comprises said 
one or more second sums, each said second sum being the sum 
of said drug response evaluated at one of said one or more 
levels of drug exposure and said combination of said one or 
more biological pathway responses subject to said best 
scaling transformations evaluated at one of said one or more 
levels of perturbations to said biological pathways, 

wherein said one or more biological pathways are 
verified as biological pathways actually involved in the 
action of said drug in said cell type if said first model 
response is selected. 



6. The method of claim 1 or 3 further comprising a step of 
assigning a cellular constituent present in said drug 

15 response to the one of said one or more biological pathways 
in which said biological pathway response of said cellular 
constituent subject to its best scaling transformation has 
the greatest correlation with said drug response of said 
cellular constituent. 

20 

7. The method of claim 1 or 3 wherein said scaling 
transformations comprise transforming said levels of drug 
exposure to corresponding levels of said perturbations to 
said biological pathways. 

8. The method of claim 7 wherein said transforming is by a 
linear mapping. 

9. The method of claim 1 or 3 wherein said objective 

30 function is minimized by selecting best sets of parameters 
for said scaling transformations. 

10. The method of claim 1 or 3 wherein said objective 
function comprises a sum of the squares of differences of 

35 said drug response at said levels of exposure to said drug 
and said model drug response at said levels of exposure to 
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said drug which, are further subject to said scaling 
transformations. 

11. The method of claim 1 or 3 wherein said objective 
function comprises the negative of the correlation of said 
drug response and said model drug response. 

12. The method of claim 1 or 3 wherein said biological 
pathway responses are provided at levels of said perturbation 
to said biological pathways by a method comprising 
interpolating said measured values. 

13. The method of claim 12 where said interpolating 
comprises approximation by a sum of spline functions. 

14. The method of claim 12 where said interpolating 
comprises approximation by a Hill function. 

15. The method of claim 1 or 3 wherein said one or more 
biological pathways in said cell type are chosen as being 
those biological pathways likely to be involved in the action 
of said drug in said cell type. 

16. The method of claim 1 or 3 wherein said one or more 
biological pathways are chosen from a compendium of 
biological pathways present in said cell type. 

17. The method of claim 1 or 3 wherein said cell type is 
substantially isogeneic to Saccharomyces cerevisiae. 

18. The method of claim 1 or 3 wherein said cellular 
constituents comprise abundances of a plurality of RNA 
species present in said cell type. 

19. The method of claim 18 wherein the abundances of said 
plurality of RNA species are measured by a method comprising 
contacting a gene transcript array with RNA from a cell of 
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said cell type, or with cDNA derived therefrom, wherein a 
gene transcript array comprises a surface with attached 
nucleic acids or nucleic acid mimics, said nucleic. acids or 
nucleic acid mimics capable of hybridizing with said 
5 plurality of RNA species, or with cDNA derived therefrom. 

20. The method of claim 19 wherein said measuring cellular 
constituents in step (a) is performed by a method comprising 
contacting one or more gene transcript arrays (i) with RNA, 

10 or with cDNA derived therefrom, from said cell of said cell 
type that is exposed to said drug, and (ii) with RNA, or with 
cDNA derived therefrom, from said cell of said cell type that 
is not exposed to said drug, and 

wherein said measuring cellular constituents in step (b) 

15 is performed by a method comprising contacting one or more 
gene transcript arrays (i) with RNA, or with cDNA derived 
therefrom, from said cell of said cell type that is exposed 
to said perturbations to said one or more biological 
pathways, and (ii) with RNA, or with cDNA derived therefrom, 

20 from said cell of said cell type that is not exposed to said 
perturbations . 

21. The method of claim 1 or 3 wherein said cellular 
constituents comprise abundances of a plurality of protein 

25 species present in said cell type. 

22. The method of claim 21 wherein the abundances of said 
plurality of protein species are measured by a method 
comprising contacting an antibody array with proteins from a 

30 cell of said cell type, wherein said antibody array comprises 
a surface.: with attached antibodies, said antibodies capable 
of binding with said plurality of protein species. 

23. The method of claim 21 wherein the abundances of said 
35 plurality of protein species are measured by a method 

comprising performing two-dimensional electrophoresis of 
proteins from a cell of said cell type. 
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24 The method of claim 1 or 3 wherein said cellular 
constituents comprise activities of a plurality of protein 
species present in said cell type. 

25. The method of claim 1 or 3 wherein said one or more 
biological pathways in said cell type comprise biological 
pathways originating at one or more specific cellular 
constituents, and wherein said perturbations to said 
biological pathways are performed by a method comprising 
modifying said one or more specific cellular constituents. 

26. The method of claim 25 wherein said one or more specific 
cellular constituents are modified by a method comprising 
causing expression of said one or more specific cellular 
constituents under the control of a controllable expression 

system, 

27. The method of claim 25 wherein said one or more specific 
cellular constituent are modified by a method comprising 
controllable transfection of genes expressing said one or 
more specific cellular constituents. 

28. The method of claim 25 wherein said one or more specific 
cellular constituents are modified by a method comprising 
controllably decreasing abundances of RNA species encoding 
said one or more specific cellular constituents in a cell of 
said cell type. 

29. The method of claim 28 wherein said method of 
controllably decreasing said abundances of RNA species 
comprises exposing a cell of said cell type to ribozymes 
targeted to cleave said RNA species. 

30. The method of claim 25 wherein said one or more specific 
; cellular constituents are modified by a method comprising 

controllably decreasing the rate of translation of RNA 
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species encoding said one or more specific cellular 
constituents in a cell of said cell type. 

31. The method of claim 30 wherein said method of 
controllably decreasing the rate of translation of RNA 
species comprises exposing a cell of said cell type to 
antisense nucleic acids or antisense nucleic acid mimics that 
hybridize to said RNA species or to DNA encoding said RNA 
species. 

32. The method of claim 25 wherein said one or more specific 
cellular constituents are abundances of protein species or 
activities of protein species, and wherein said one or more 
specific cellular constituents are modified by a method 
comprising controllably decreasing said abundances in a cell 
of said cell type. 

33. The method of claim 32 wherein said method of 
controllably decreasing said abundances comprises causing 
expression in a cell of said cell type of said one or more 
protein species as fusion proteins comprising said protein 
species and a degron, wherein said degron is controllable to 
increase the rate of degradation of said protein species. 

34. The method of claim 32 wherein said method of 
controllably decreasing said abundances comprises exposing a 
cell of said cell type to antibodies, wherein said antibodies 
bind said protein species. 

35. The method of claim 25 wherein said one or more specific 
cellular- constituents are activities of protein species, and 
wherein said one or more specific cellular constituents are 
modified by a method comprising controllably decreasing said 
activities in a cell of said cell type. 

36. The method of claim 35 wherein said method of 
controllably decreasing said activities comprises exposing a 
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cell of said cell type to drugs which directly and 
specifically inhibit said activities of said protein species. 

37. The method of claim 35 wherein said method of 
controllably decreasing said activities comprises exposing a 
cell of said cell type to dominant negative mutant protein 
species, wherein said dominant negative mutant protein 
species are proteins inhibiting said activities. 

38. A method of determining a more pathway- specif ic drug 
candidate from an initial drug candidate comprising: 

(a) identifying the biological pathways involved in the 
action of an initial drug candidate by a method comprising 
determining best scaling transformations of one or more 
biological pathway responses which are the product of a 
method comprising measuring cellular constituents of one or 
more biological pathways in a cell of a cell type at a 
plurality of levels of perturbation to said biological 

pathways, wherein 

(i) said best scaling transformations minimize the 

value of an objective function of the 
difference between a provided drug response for 
said initial drug candidate and a model drug 
response for said initial drug candidate, 

(ii) said provided drug response for said initial 
drug candidate is provided by a method 
comprising measuring a plurality of cellular 
constituents in a cell of the cell type at a 
plurality of levels of exposure to said initial 
drug candidate, and 

(iii) said model drug response for said initial 
drug candidate is represented as a combination 
of said one or more biological pathway 
responses, each of said one or more biological 
pathway responses in said combination being 
subject to an independent scaling 
transformation; 
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(b) modifying the structure of said initial drug 
candidate to produce a modified drug candidate; and 

(c) identifying the biological pathways involved in the 
action of said modified drug candidate by a method comprising 
determining best scaling transformations of said one or more 
biological pathway responses, wherein 

(i) said best scaling transformations- minimize the 
value of an objective function of the 
difference between a provided drug response for 
said modified drug candidate, and a model drug 
response for said modified drug candidate, 

(ii) said provided drug response for said modified 

drug candidate is provided by a method 
comprising measuring a plurality of cellular 
constituents in a cell of the cell type at a 
plurality of levels of exposure to said 
modified drug candidate, and 
(iii) said model drug response for said modified 
drug candidate is represented as a combination 
of said one or more biological pathway 
responses, each of said one or more biological 
pathway responses in said combination being 
subject to an independent scaling 
t r ans format ion ; 
wherein said modified drug candidate is a more 
pathway-specific drug candidate than said initial drug 
candidate if fewer biological pathways are identified in the 
action of said modified drug candidate than in the action of 
said initial drug candidate. 

39. A method of identifying one or more specific biological 
pathways " that are involved in the action of a drug and that 
mediate side-effects of the drug, said method comprising: 

(a) identifying one or more biological pathways 
involved in the action of a first drug by determining best 
scaling transformations of one or more biological pathway 
responses which are the product of a method comprising 
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measuring cellular constituents of one or more biological 
pathways in a cell of a cell type at a plurality of levels of 
perturbation to said biological pathways, wherein 

(i) said best scaling transformations minimize the 
value of an objective function of the 
difference between a provided drug response for 
said first drug, and a model drug response for 
said first drug, 

(ii) said provided drug response for said first 
drug is provided by a method comprising 
measuring a plurality of cellular constituents 
in a cell of the cell type at a plurality of 
levels of exposure to said first drug, and 

(iii) said model drug response for said first drug 
is represented as a combination of said one or 
more biological pathway responses, each of said 
one or more biological pathway responses in 
said combination being subject to an 
independent scaling transformation; 

(b) identifying one or more biological pathways 
involved in the action of a second drug by determining best 
scaling transformations of said one or more biological 
pathway responses, wherein said first and second drugs are 
different and exhibit therapeutic efficacy for the same 
disease or disorder, and wherein 

(i) said best scaling transformations minimize the 
value of an objective function of the 
difference between a provided drug response for 
said second drug and a model drug response for 
said second drug, 

(ii) said provided drug response for said second 
drug is provided by a method comprising 
measuring a plurality of cellular constituents 
in a cell of the cell type at a plurality of 
levels of exposure to said second drug, and 

(iii) said model drug response for said second drug 
is represented as a combination of said one or 
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more biological pathway responses, each of said 
one or more biological pathway responses in 
said combination being subject to an 
independent scaling transformation; and 
(c) identifying specific biological pathways involved 
in the action of said first drug that are different from 
• those biological pathways involved in the action of said 
second drug, thereby identifying one or more specific 
biological pathways that are involved in the action of said 
first drug and that mediate side-effects of said first drug. 

40. A method of identifying one or more specific biological 
pathways that are involved in mediating therapeutic efficacy 
for a disease or disorder, said method comprising: 

(a) identifying one or more biological pathways 
involved in the action of a first drug by determining best 
scaling transformations of one or more biological pathway 
responses which are the product of a method comprising 
measuring cellular constituents of one or more biological 
pathways in a cell of a cell type at a plurality of levels of 
perturbation to said biological pathways, wherein 

(i) said best scaling transformations minimize the 
value of an objective function of the 
difference between a provided drug response for 
said first drug and a model drug response for 
said first drug, 

(ii) said provided drug response for said first 
drug is provided by a method comprising 
measuring a plurality of cellular constituents 
in a cell of the cell type at a plurality of 
levels of exposure to said first drug, and 

' (iii) said model drug response for said first drug 
is represented as a combination of said one or 
more biological pathway responses, each of said 
one or more biological pathway responses in 
said combination being subject to an 
independent scaling transformation; 



119 



WO 99/58708 



PCT/US99/10050 



(b) identifying the biological pathways involved in the 
action of a second drug by determining best scaling 
transformations of said one or more biological pathway 
responses, wherein said first and second drugs are different 
and exhibit therapeutic efficacy for the same disease or 
disorder, and' wherein 

(i) said best scaling transformations minimize the 
value of an objective function of the 
difference between a provided drug response for 
said second drug and a model drug response for 
said second drug, 

(ii) said provided drug response for said second 
drug is provided by a method comprising 
measuring a plurality of cellular constituents 
in a cell of the cell type at a plurality of 
levels of exposure to said second drug, and 

(iii) said model drug response for said second drug 
is represented as a combination of said one or 
more biological pathway responses, each of said 
one or more biological pathway responses in 
said combination being subject to an 
independent scaling transformation; and 

(c) identifying specific biological pathways involved in 
the action of both said first drug and said second drug, 
thereby identifying one or more specific biological pathways 
that are involved in the action of said first drug and that 
mediate therapeutic efficacy for said disease or disorder. 

41. A computer system for identifying biological pathways 
involved in the action of a drug in a cell type comprising a 
processor and a memory coupled to said processor, said memory 
encoding one or more programs, said one or more programs 
causing said processor to perform a method comprising the 
steps of : 

(a) receiving a drug response of said drug in said cell 
type, said drug response comprising measurements of a 
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plurality of cellular constituents in a cell of said cell 
type at a plurality of levels of drug exposure; 

(b) receiving one or more biological pathway responses, 
each of said one or more biological pathway responses 

5 comprising measurements of cellular constituents of said 

biological pathway in a cell of said cell type at a plurality 
of levels of a perturbation to said biological pathway; 

(c) - forming a model drug response as a combination of 
said one or more biological pathway, wherein each of said one 

10 or more biological pathway responses in said combination is 
subject to an independent scaling transformation; 

(d) determining the value of an objective function of 
the difference between said drug response and said model drug 
re spons e ; and 

15 (e) minimizing said determined value of said objective 

function by varying the scaling transformations of said one 
or more biological pathway responses to obtain best scaling 
transformations that minimize said determined value of said 
objective function; 

20 wherein said combination of said one or more biological 

pathways responses subject to said best scaling 
transformations identifies the biological pathways involved 
in the action of said drug in said cell type, 

25 42. The computer system of claim 41 where said steps of 

receiving comprise making said drug response measurements and 
said biological pathway response measurements available in 
said memory. 

30 43. The computer system of claim 41 wherein said forming a 
model drug response . comprises adding said one or more 
biological pathway responses. 

44. The computer system of claim 41 wherein said objective 
35 function comprises a sum of squares of the differences of 
said drug response and said model drug response at said 
levels of drug exposure, said model drug response being 
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provided at said levels of drug exposure by transforming by- 
said scaling transformations said levels of drug exposure to 
corresponding levels of perturbations to each of said 
biological pathways and by interpolating said biological 
5 pathway responses to said corresponding levels of 
perturbat ions . 

45. The computer system of claim 41 wherein said minimizing 
comprises performing ,the Levenberg-Marquandt method. 

10 

46. A method of measuring the similarity of the effects of 
two drugs on a cell type comprising: 

(a) providing a first drug response of a first drug in 
said cell type, said drug response having been obtained by a 

15 method comprising measuring a plurality of cellular 

constituents in a cell of said cell type at a plurality of 
levels of exposure to said first drug; 

(b) providing a second drug response for a second drug 
in a cell type, said second drug response having been 

20 obtained by a method comprising measuring a plurality of 
cellular constituents in a cell of said cell type at a 
plurality of levels of exposure to said second drug; and 

(c) determining the best scaling transformation of said 
biological pathway response which minimizes the value of an 

25 objective function of the difference between said first drug 
response and said second drug response, 

wherein said minimized value of said objective function 
provides a measure of similarity of the effects of said first 
and second drugs in said cell type. 

30 

47. A method of identifying biological pathways involved in 
the effect of an environmental change upon a cell type 
comprising; 

(a) providing an environmental response to said 
35 environmental change upon said cell type, said environmental 
response having been obtained by a method comprising 
measuring a plurality of cellular constituents in a cell of 
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said cell type at a plurality of degrees of severity of said 
environmental change; 

(b) representing a model environmental response as a 
combination of one or more biological pathway responses in 

5 said cell type, wherein a biological pathway response in said 
cell type is the product of a method comprising measuring 
cellular constituents of said biological pathway in a cell of 
said cell type at a plurality of levels of a perturbation to 
said biological pathway, and wherein each of said one or more 
10 biological pathway responses in said combination are subject 
to an independent scaling transformation; and 

(c) determining best scaling transformations of said 
one or more biological pathway responses which minimize the 
value of an objective function of the difference between said 

15 environmental response and said model environmental response, 
wherein said combination of said one or more biological 
pathway responses subject to said best scaling 
transformations identifies the biological pathways involved 
in the effect of said environmental change upon said cell 

20 type. 

48. The method of claim 3 wherein a statistical significance 
is assigned to the combination of said one or more biological 
pathway responses subject to said best scaling 
25 transformations, wherein said statistical significance is 
assigned by a method comprising: 



49. The method of claim 48 wherein said expected probability 
distribution is obtained by a method comprising: 
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(a) 



(b) 



obtaining an expected probability distribution of 
minimized values of said objective function; and 
assessing statistical significance of an actual 
minimized value of said objective function in view 
of said expected probability distribution, wherein 
said actual minimized value of said objective 
function is determined from the provided drug 
response and the model drug response. 



35 
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(a) randomizing said drug response with respect to said 
plurality of levels of drug exposure; 

(b) randomizing said model drug response by a method 
comprising randomizing said one or more biological 
pathway responses with respect to said plurality of 
levels of perturbation to said one or more 
biological pathways; 

(c) determining a theoretical minimum value of said 
objective function by determining best scaling 
transformations of said one or more randomized 
biological pathway responses which minimize an 
objective function of the difference between said 
randomized drug response and said randomized model 
drug response; and 

(d) repeating steps (a) - (c) , so that a plurality of 
theoretical minimum values is thereby determined, 

wherein said plurality of theoretical minimum values forms 
said expected probability distribution. 

50. A method of identifying biological pathways involved in 
the effect of an environmental change upon a cell type, said 
method comprising determining best scaling transformations of 
one or more biological pathway responses which minimize the 
value of an objective function of the difference between a 
provided environmental response and a model environmental 
response, wherein: 

(a) said one or more biological pathway responses are 
the product of a method comprising measuring 
cellular constituents of one or more biological 
pathways in a cell of said cell type at a plurality 
of levels, of perturbation to said biological 
pathways ; 

(b) said provided environmental response is provided by 
a method comprising measuring a plurality of 
cellular constituents in a cell of the cell type at 
a plurality of levels of exposure to said 
environmental change; and 
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(c) said model environmental response is represented a 
a combination of said one or more biological 
pathway responses, each of said one or more 
biological pathway responses in said combination 
being subject to an independent scaling 
t r ans format ion , 
wherein the combination of said one or more biological 
pathway responses subject to said best scaling 
transformations identifies the biological pathways involved 
in the effect of said environmental change upon said cell 
type . 
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